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Abstract 

In  many  important  applications,  a  collection  of  mutually  distrustful  parties  must  perform  private 
computation  over  multisets.  Each  party’s  input  to  the  function  is  his  private  input  multiset.  In  order 
to  protect  these  private  sets,  the  players  perform  privacy-preserving  computation;  that  is,  no  party 
learns  more  information  about  other  parties’  private  input  sets  than  what  can  be  deduced  from  the 
result.  In  this  paper,  we  propose  efficient  techniques  for  privacy-preserving  operations  on  multisets. 
By  employing  the  mathematical  properties  of  polynomials,  we  build  a  framework  of  efficient,  secure, 
and  composable  multiset  operations:  the  union,  intersection,  and  element  reduction  operations.  We 
apply  these  techniques  to  a  wide  range  of  practical  problems,  achieving  more  efficient  results  than 
those  of  previous  work. 
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1  Introduction 


Private  computation  over  sets  and  multisets  is  required  in  many  important  applications.  In  the 
real  world,  parties  often  resort  to  use  of  a  trusted  third  party,  who  computes  a  fixed  function  on 
all  parties’  private  input  multisets,  or  forgo  the  application  altogether.  This  unconditional  trust 
is  fraught  with  security  risks;  the  trusted  party  may  be  dishonest  or  compromised,  as  it  is  an 
attractive  target.  We  design  efficient  privacy-preserving  techniques  and  protocols  for  computation 
over  multisets  by  mutually  distrustful  parties:  no  party  learns  more  information  about  other  parties’ 
private  input  sets  than  what  can  be  deduced  from  the  result  of  the  computation. 

For  example,  to  determine  which  airline  passengers  appear  on  a  ‘do-not-fly’  list,  the  airline 
must  perform  a  set-intersection  operation  between  its  private  passenger  list  and  the  government’s 
list.  This  is  an  example  of  the  Set-Intersection  problem.  If  a  social  services  organization  needs 
to  determine  the  list  of  people  on  welfare  who  have  cancer,  the  union  of  each  hospital’s  lists  of 
cancer  patients  must  be  calculated  (but  not  revealed),  then  an  intersection  operation  between  the 
unrevealed  list  of  cancer  patients  and  the  welfare  rolls  must  be  performed.  This  problem  may 
be  efficiently  solved  by  composition  of  our  private  union  and  set-intersection  techniques.  Another 
example  is  privacy-preserving  distributed  network  monitoring.  In  this  scenario,  each  node  monitors 
anomalous  local  traffic,  and  a  distributed  group  of  nodes  collectively  identify  popular  anomalous 
behaviors:  behaviors  that  are  identified  by  at  least  a  threshold  t  number  of  monitors.  This  is  an 
example  of  the  Over-Threshold  Set-Union  problem. 

Contributions.  In  this  paper,  we  propose  efficient  techniques  for  privacy-preserving  operations 
on  multisets.  By  building  a  framework  of  set  operations  using  polynomial  representations  and  em¬ 
ploying  the  mathematical  properties  of  polynomials,  we  design  efficient  methods  to  enable  privacy¬ 
preserving  computation  of  the  union,  intersection,  and  element  reduction^  multiset  operations. 

An  important  feature  of  our  privacy-preserving  multiset  operations  is  that  they  can  be  com¬ 
posed,  and  thus  enable  a  wide  range  of  applications.  To  demonstrate  the  power  of  our  techniques, 
we  apply  our  operations  to  solve  specific  problems,  including  Set-Intersection,  Cardinality  Set- 
Intersection,  Over-Threshold  Set-Union,  and  Threshold  Set-Union,  as  well  as  determining  the  Sub¬ 
set  relation.  Furthermore,  we  show  that  our  techniques  can  be  used  to  efficiently  compute  the 
output  of  any  function  over  multisets  expressed  in  the  following  grammar,  where  s  represents  any 
set  held  by  some  player  and  d  >  1: 

T  ::=  s  I  Rdrf(T)  |TnT|sUT|TUs 

Note  that  any  monotonic  function  over  multisets^  can  be  expressed  using  our  grammar,  showing 
that  our  techniques  have  truly  general  applicability.  Finally,  we  show  that  our  techniques  are 
applicable  even  outside  the  realm  of  set  computation.  As  an  example,  we  describe  how  to  utilize 
our  techniques  to  efficiently  and  privately  evaluate  CNF  boolean  functions. 

Our  protocols  are  more  efficient  than  the  results  obtained  from  previous  work.  General  multi¬ 
party  computation  is  the  best  previous  result  for  most  of  the  problems  that  we  address  in  this  paper. 
Only  the  private  Set-Intersection  problem  and  two-party  Cardinality  Set-Intersection  problem  have 
been  previously  studied  [13].  However,  previous  work  only  provides  protocols  for  3-or-more-party 

^The  element  reduction  by  rf,  Rdtj(T),  of  a  multiset  A  is  the  multiset  composed  of  the  elements  of  A  such  that  for 
every  element  a  that  appears  in  A  at  least  d'  >  d  times,  a  is  included  d'  —  d  times  in  Rdd(h.). 

^Any  function  computed  with  only  intersection  and  union,  without  use  of  an  inverse  operation. 
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Problem 

Communication 
Complexity  of 
our  solution 

Communication 
Complexity  of 
previous  solution 

Communication 
Complexity  of 
general  MPC 

Set-Intersection  (HBC) 

0{cnk  Ig  H  ) 

0{n^klg\P\)  [13] 

0{n^k  polylog(A:)  Ig  P  ) 

Set-Intersection  (Malicious) 

oln^k\g\P\) 

none 

0{n^k  polylog(A:)  Ig  P  ) 

Cardinality  Set-Intersection  (HBC) 

oln^k\g\P\) 

none 

0{n^k  polylog(A:)  Ig  P  ) 

Over- Threshold  Set-Union  (HBC) 

0{n^k\g\P\) 

none 

0{n‘^k  polylog(n/c)  Ig  P  ) 

Threshold  Set-Union  (HBC) 

0{n^k\g  P  ) 

none 

0{ri^k  polylog(n/c)  Ig  P  ) 

Subset  (HBC) 

0(A:lg|P|) 

none 

0{k  polylog(A:)lg|P|) 

Table  1:  Total  communication  complexity  comparison  for  our  multiparty  protocols,  previous  solu¬ 
tions,  and  general  multiparty  computation.  There  are  n  >2  players,  c  <  n  dishonestly  colluding, 
each  with  an  input  multiset  of  size  k.  The  domain  of  the  multiset  elements  is  P.  Security  parameters 
are  not  included  in  the  communication  complexity. 


Set-Intersection  secure  only  against  honest-but-curious  players;  it  is  not  obvious  how  to  extend 
this  work  to  achieve  security  against  malicious  players.  Also,  previous  work  focuses  on  achieving 
results  for  the  Set-Intersection  problem  in  isolation  ~  these  techniques  cannot  be  used  to  compose 
set  operations.  In  contrast,  we  provide  efficient  solutions  for  private  multi-party  Set-Intersection 
secure  against  malicious  players,  and  our  multiset  intersection  operator  can  be  easily  composed 
with  other  operations  to  enable  a  wide  range  of  efficient  private  computation  over  multisets.  We 
compare  the  communication  complexity  of  our  protocols  with  previous  work  and  solutions  based 
on  general  multiparty  communication  in  Table  1.  Note  that  the  techniques  utilized  to  create 
the  circuits  for  the  general  solution  are  both  complex  and  incur  very  large  constants,  on  top  of 
the  constants  inherent  in  the  use  of  general  multiparty  computation  [1];  we  thus  achieve  greater 
practical  efficiency,  as  well  as  asymptotic  efficiency. 

Our  protocols  are  provably  secure  in  the  PPT-bounded  adversary  model.  We  consider  both 
standard  adversary  models:  honest-but-curious  adversaries  (HBC)  and  malicious  adversaries.  For 
protocols  secure  in  the  HBC  model,  we  prove  that  the  information  learned  by  any  coalition  of 
honest-but-curious  players  is  indistinguishable  from  the  information  learned  in  the  ideal  model, 
where  a  trusted  third  party  (TTP)  calculates  the  function.  For  protocols  secure  in  the  malicious 
model,  we  provide  simulation  proofs  showing  that  for  any  strategy  followed  by  a  malicious  coalition 
F  in  the  real  protocol,  there  is  a  translated  strategy  they  could  follow  in  the  ideal  model,  such  that, 
to  F,  the  real  execution  is  computationally  indistinguishable  from  ideal  execution. 

Outline.  We  discuss  related  work  in  Section  2.  In  Section  3,  we  introduce  our  adversary  models, 
as  well  as  our  cryptographic  tools.  We  describe  our  privacy- preserving  set  operation  techniques  in  in 
Section  4.  Section  5  gives  protocols,  secure  against  honest-but-curious  players,  and  security  analysis 
for  the  Set-Intersection  and  Cardinality  Set- Intersection  problems.  Section  6  gives  protocols,  secure 
against  honest-but-curious  players,  and  security  analysis  for  the  Over-Threshold  Set-Union  problem, 
as  well  for  several  variants  of  the  Threshold  Set-Union  problem.  We  introduce  techniques  and 
protocols  secure  against  malicious  players  for  the  Set-Intersection,  Cardinality  Set-Intersection,  and 
Over-Threshold  Set-Union  problems  in  Section  7.  Finally,  we  discuss  several  additional  applications 
of  our  techniques  in  Section  8,  including  the  subset  protocol,  general  privacy-preserving  computation 
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over  sets,  and  evaluation  of  CNF  boolean  formulas. 


2  Related  Work 

For  most  of  the  privacy-preserving  set  function  problems  we  address  in  this  paper  (except  for 
the  Set-Intersection  problem),  the  best  previously  known  results  are  through  general  multiparty 
computation.  General  two-party  computation  was  introduced  by  Yao  [28] ,  and  general  computation 
for  multiple  parties  was  introduced  in  [2].  In  general  multiparty  computation,  the  players  share 
the  values  of  each  input,  and  cooperatively  evaluate  the  circuit.  For  each  multiplication  gate,  the 
players  must  cooperate  to  securely  multiply  their  inputs  and  re-share  the  result,  requiring  0{n) 
communication  for  honest-but-curious  players  and  0{n^)  communication  for  malicious  players  [16]. 
Recent  results  that  allow  non-interactive  private  multiplication  of  shares  [8]  do  not  extend  to  our 
adversary  model,  in  which  any  c  <  n  players  may  collude.  Our  results  are  more  efficient  than  the 
general  MFC  approach;  we  compare  communication  complexity  in  Table  1. 

The  most  relevant  work  to  our  paper  is  by  Freedman,  Nissim,  and  Pinkas  (FNP)  [13].  They 
proposed  protocols  for  the  problems  related  to  Set-Intersection,  based  on  the  representation  of 
sets  as  roots  of  a  polynomial  [13].  Their  work  does  not  utilize  properties  of  polynomials  beyond 
evaluation  at  given  points.  We  explore  the  power  of  polynomial  representation  of  multisets,  using 
operations  on  polynomials  to  obtain  composable  privacy-preserving  multisets  operations.  We  give 
a  more  detailed  comparison  of  our  Set-Intersection  protocol  with  FNP  in  Table  1  and  in  Section  1 . 

Much  work  has  been  done  in  designing  solutions  for  privacy-preserving  computation  of  different 
functions.  For  example,  private  equality  testing  is  the  problem  of  set-intersection  for  the  case  in 
which  the  size  of  the  private  input  sets  is  1.  Protocols  for  this  problem  are  proposed  in  [10,  24,  22], 
and  fairness  is  added  in  [3] .  We  do  not  enumerate  the  works  of  privacy- preserving  computation  of 
other  functions  here,  as  they  address  drastically  different  problems  and  cannot  be  applied  to  our 
setting. 

3  Preliminaries 

The  notation  used  in  this  paper  is  described  in  Appendix  A.  In  this  section,  we  describe  our 
adversary  models  and  the  cryptographic  tools  used  in  this  paper. 

3.1  Adversary  Models 

In  this  paper,  we  consider  two  standard  adversary  models:  honest-but-curious  adversaries  and  ma¬ 
licious  adversaries.  We  provide  intuition  and  informal  definitions  of  these  models;  formal  definitions 
of  these  models  can  be  found  in  [16]. 

Honest-But-Curious  Adversaries.  In  this  model,  all  parties  act  according  to  their  prescribed 
actions  in  the  protocol.  Security  in  this  model  is  straightforward:  no  player  or  coalition  of  c  <  n 
players  (who  cheat  by  sharing  their  private  information)  gains  information  about  other  players’ 
private  input  sets,  other  than  what  can  be  deduced  from  the  result  of  the  protocol.  This  is  formalized 
by  considering  an  ideal  implementation  where  a  trusted  third  party  (TTP)  receives  the  inputs  of  the 
parties  and  outputs  the  result  of  the  defined  function.  We  require  that  in  the  real  implementation 
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of  the  protocol — that  is,  one  without  a  TTP — each  party  does  not  learn  more  information  than  in 
the  ideal  implementation. 

Malicious  Adversaries.  In  this  model,  an  adversary  may  behave  arbitrarily.  In  particular,  we 
cannot  hope  to  prevent  malicious  parties  from  refusing  to  participate  in  the  protocol,  choosing 
arbitrary  values  for  its  private  input  set,  or  aborting  the  protocol  prematurely.  Instead,  we  focus 
on  the  standard  security  definition  (see,  e.g.,  [16])  which  captures  the  correctness  and  the  privacy 
issues  of  the  protocol.  Informally,  the  security  definition  is  based  on  a  comparison  between  the 
ideal  model  and  a  TTP,  where  a  malicious  party  may  give  arbitrary  input  to  the  TTP.  The  security 
definition  is  also  limited  to  the  case  where  at  least  one  of  the  parties  is  honest.  Let  T  be  the 
set  of  colluding  malicious  parties;  for  any  strategy  T  can  follow  in  the  real  protocol,  there  is  a 
translated  strategy  that  it  could  follow  in  the  ideal  model,  such  that,  to  T,  the  real  execution  is 
computationally  indistinguishable  from  execution  in  the  ideal  model. 

3.2  Additively  Homomorphic  Cryptosystem 

In  this  paper  we  utilize  a  semantically  secure  [17],  additively  homomorphic  public-key  cryptosystem. 
Let  Epk{  )  denote  the  encryption  function  with  public  key  pk.  The  cryptosystem  supports  the 
following  operations,  which  can  be  performed  without  knowledge  of  the  private  key:  (1)  Given  the 
encryptions  of  a  and  b,  Epk{a)  and  Epk{b),  we  can  efficiently  compute  the  encryption  of  a  -|-  6, 
denoted  Epk{a  +  b)  :=  Epk{a)  +h  Ep^ib);  (2)  Given  a  constant  c  and  the  encryption  of  a,  Epk{a), 
we  can  efficiently  compute  the  encryption  of  ca,  denoted  Epk{c  ■  a)  :=  c  Xh  Epk{a).  When  such 
operations  are  performed,  we  require  that  the  resulting  ciphertexts  be  re-randomized  for  security. 
In  re-randomization,  a  ciphertext  is  transformed  so  as  to  form  an  encryption  of  the  same  plaintext, 
under  a  different  random  string  than  the  one  originally  used.  We  also  require  that  the  homomorphic 
public-key  cryptosystem  support  secure  (n,  n)-threshold  decryption,  i.e.,  the  corresponding  private 
key  is  shared  by  a  group  of  n  players,  and  decryption  must  be  performed  by  all  players  acting 
together. 

In  our  protocols  for  the  malicious  case,  we  require:  (1)  the  decryption  protocol  be  secure 
against  malicious  players,  typically,  this  is  done  by  requiring  each  player  to  prove  in  zero-knowledge 
that  he  has  followed  the  threshold  decryption  protocol  correctly  [15];  (2)  efficient  construction  of 
zero-knowledge  proofs  of  plaintext  knowledge;  (3)  optionally,  efficient  construction  of  certain  zero- 
knowledge  proofs,  as  detailed  inSection  7.1. 

Note  that  Paillier’s  cryptosystem  [26]  satisfies  each  of  our  requirements:  it  is  additively  homo¬ 
morphic,  supports  ciphertexts  re-randomization  and  threshold  decryption  (secure  in  the  malicious 
case)  [11,  12],  and  allows  certain  efficient  zero-knowledge  proofs  (standard  constructions  from  [6,  4], 
and  proof  of  plaintext  knowledge  [7] ) . 

In  the  remainder  of  this  paper,  we  simply  use  Epk{-)  to  denote  the  encryption  function  of  the 
homomorphic  cryptosystem  which  satisfies  all  the  aforementioned  properties. 

3.3  Shuffle  Protocol 

Each  player  i  {1  <i  <n)  has  a  private  input  multiset  Vi.  We  define  the  Shuffle  problem  as  follows: 
all  players  learn  the  joint  multiset  Vi  U  •  •  •  U  14.,  such  that  no  player  or  coalition  of  players  T  can 
gain  a  non-negligible  advantage  in  distinguishing,  for  each  element  a  G  Vi  U  •  •  •  U  an  honest 
player  i  (1  <  i  <  n,  i  ^  T)  such  that  a  G  V). 
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In  several  protocols  in  this  paper,  we  will  impose  an  additional  privacy  condition  on  the  Shuffle 
problem;  the  multisets  Vi,...,Vn  are  composed  of  ciphertexts,  which  must  be  re-randomized  so 
that  no  player  may  determine  which  ciphertexts  were  part  of  his  private  input  multiset.  The 
revised  problem  statement  is  as  follows:  all  players  learn  the  joint  multiset  hi  U  •  •  •  U  In,  such  that 
no  player  or  coalition  of  players  can  gain  a  non-negligible  advantage  in  distinguishing,  for  each 
element  a  G  Vi  U  •  •  •  U  Vi^,  an  honest  player  i  {1  <i  <n)  such  that  a  G  Vi. 

Both  variants  of  the  Shuffle  protocol  can  be  easily  accomplished  with  standard  techniques  [5, 
18,  9,  14,  25],  with  communication  complexity  at  most  0{'n?k). 

4  Techniques  and  Mathematical  Intuition 

In  this  section,  we  introduce  our  techniques  for  privacy-preserving  computation  of  operations  on 
multisets. 

Problem  Setting.  Let  there  be  n  players.  We  denote  the  private  input  set  of  player  i  as  Si,  and 
\Si\  =  k  {1  <  i  <  n).  We  denote  the  j'th  element  of  set  i  as  {Si)j.  We  denote  the  domain  of  the 
elements  in  these  sets  as  P,  {Si)j  G  P). 

Let  R  denote  the  plaintext  domain  Dom(£'pfc(-))  (in  Paillier’s  cryptosystem,  R  is  Zj^).  We 
require  that  R  be  sufficiently  large  that  an  element  a  drawn  uniformly  from  R  has  only  negligible 
probability  of  representing  an  element  of  P,  denoted  a  G  P.  For  example,  we  could  require  that 
only  elements  of  the  form  6  =  a  ||  h{a)  could  represent  an  element  in  P.  That  is,  there  exists  an 
a  of  proper  length  such  that  b  =  a  \\  h{a).  If  \h{-)\  =  Ig  (^),  then  there  is  only  e  probability  that 
a'  ^  R  represents  an  element  in  P. 

In  this  section,  we  first  give  background  on  polynomial  representation  of  multisets,  as  well  as 
the  mathematical  properties  of  polynomials  that  we  use  in  this  paper.  We  then  introduce  our 
privacy-preserving  (TTP  model)  set  operations  using  polynomial  representations,  then  show  how 
to  achieve  privacy  in  the  real  setting  by  calculating  them  using  encrypted  polynomials.  Finally,  we 
overview  the  applications  of  these  techniques  explored  in  the  rest  of  the  paper. 

4.1  Background:  Polynomial  Rings  and  Polynomial  Representation  of  Sets 

The  polynomial  ring  i?[x]  consists  of  all  polynomials  with  coefficients  from  R.  Let  f,gG  R[x],  such 
that  f{x)  =  /[*]^*)  where  f[i]  denotes  the  coefficient  of  x*  in  the  polynomial  /.  Let  f  +  g 

denote  the  addition  of  /  and  g,  f  *  g  denote  the  multiplication  of  /  and  g,  and  denote  the  dth 
formal  derivative  of  /.  Note  that  the  formal  derivative  of  /  is  +  l)/[^  +  \]x'' . 

Polynomial  Representation  of  Sets.  In  this  paper,  we  use  polynomials  to  represent  multisets. 
Given  a  multiset  S  =  we  construct  a  polynomial  representation  of  S',  /  G  R[x\,  as 

—  ni<j<fc(2^  “  Sj).  On  the  other  hand,  given  a  polynomial  /  G  R[x],  we  define  the  multiset 

5  represented  by  the  polynomial  /  as  follows:  an  element  a  G  S'  if  and  only  if  (1)  /(a)  =  0  and 

(2)  a  represents  an  element  from  P.  Note  that  our  polynomial  representation  naturally  handles 
multisets:  The  element  a  appears  in  the  multiset  b  times  if  (x  —  a)^  \  f  A  (x  —  /  /. 

Note  that  previous  work  has  proposed  to  use  polynomials  to  represent  sets  [13]  (as  opposed 
to  multisets).  However,  to  the  best  of  our  knowledge,  previous  work  has  only  utilized  the  tech¬ 
nique  of  polynomial  evaluation  for  privacy-preserving  operations.  As  a  result,  previous  work  is 
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limited  to  set  intersection  and  cannot  be  composed  with  other  set  operators.  In  this  paper,  we 
propose  a  framework  to  perform  various  set  operations  using  polynomial  representations  and  con¬ 
struct  efficient  privacy- preserving  set  operations  using  the  mathematical  properties  of  polynomials. 
By  utilizing  polynomial  representations  as  the  intermediate  form  of  representations  of  sets,  our 
framework  allows  arbitrary  composition  of  set  operators  as  outlined  in  our  grammar. 

4.2  Our  Techniques:  Privacy-Preserving  Set  Operations 

In  this  section,  we  construct  algorithms  for  computing  the  polynomial  representation  of  operations 
on  sets,  including  union,  intersection,  and  element  reduction.  We  design  these  algorithms  to  be 
privacy-preserving  in  the  following  sense:  the  polynomial  representation  of  any  operation  result 
reveals  no  more  information  than  the  set  representation  of  the  result.  First,  we  introduce  our 
algorithms  for  computing  the  polynomial  representation  of  set  operations  union,  intersection,  and 
element  reduction  (with  a  trusted  third  party).  We  then  extend  these  techniques  to  encrypted 
polynomials,  allowing  secure  implementation  of  our  techniques  without  a  trusted  third  party.  Note 
that  the  privacy-preserving  set  operations  defined  in  this  section  may  be  arbitrarily  composed  (see 
Section  8.1),  and  constitute  truly  general  techniques. 

4.2.1  Set  Operations  Using  Polynomial  Representations 

In  this  section,  we  introduce  efficient  techniques  for  set  operations  using  polynomial  representations. 
In  particular,  let  /,  g  be  polynomial  representations  of  the  multisets  S,  T.  We  describe  techniques 
to  compute  the  polynomial  representation  of  their  union,  intersection,  and  element  reduction  by 
d.  We  design  our  techniques  so  that  the  polynomial  representation  of  any  operation  result  reveals 
no  more  information  than  the  set  representation  of  the  result.  This  privacy  property  is  formally 
stated  in  Theorems  1,  3,  and  5,  by  comparing  to  the  ideal  model. 

Union.  We  define  the  union  of  multisets  SUT  as  the  multiset  where  each  element  a  that  appears 
in  5  65  >  0  times  and  T  br  >0  times  appears  in  the  resulting  multiset  bs  +  bx  times.  We  compute 
the  polynomial  representation  of  S'  U  T  as  follows,  where  /  and  g  are  the  polynomial  representation 
of  S'  and  T  respectively: 

f  *9- 

Note  that  f  *  g  is  a  polynomial  representation  of  S'  U  T  because  (1)  all  elements  that  appear  in 
either  set  5  or  T  are  preserved:  (/(a)  =  0)  A  {g{b)  =  0)  ^  {{f  *  9')(a)  =  0)  A  {{f  *  g){b)  =  0);  (2)  as 
/(a)  =  0  AA  (x  —  a)  I  /,  duplicate  elements  from  each  multiset  are  preserved:  (/(a)  =  0)  A  {g{a)  = 
0)  ^  (x  —  a)^  I  (/  *  g).  In  addition,  we  prove  that,  given  f  *  g,  one  cannot  learn  more  information 
about  S'  and  T  than  what  can  be  deduced  from  S'  U  T,  as  formally  stated  in  the  following  theorem: 

Theorem  1.  Let  TTPl  be  a  trusted  third  party  which  receives  the  private  input  multiset  Si  from 
player  i  for  1  <  i  <  n,  and  then  returns  to  every  player  the  union  multiset  S'!  U  •  •  •  U  S'n  directly. 
Let  TTP2  be  another  trusted  third  party,  which  receives  the  private  input  multiset  Si  from  player  i 
for  \  <i  <n,  and  then:  (1)  calculates  the  polynomial  representation  fi  for  each  Si;  (2)  computes 
and  returns  to  every  player  nr=i/»- 

There  exists  a  PPT  translation  algorithm  such  that,  to  each  player,  the  results  of  the  following 
two  scenarios  are  distributed  identically:  (1)  applying  translation  to  the  output  of  TTPl;  (2) 
returning  the  output  of  TTP2  directly. 


6 


Proof.  Theorem  1  is  trivially  true.  (This  theorem  is  included  for  completeness.) 


□ 


Intersection.  We  define  the  intersection  of  multisets  5  n  T  as  the  multiset  where  each  element 
a  that  appears  in  S  bs  >  0  times  and  T  bx  >  0  times  appears  in  the  resulting  multiset  min{65',  bx} 
times.  Let  S  and  T  be  two  multisets  of  equal  size,  and  /  and  g  be  their  polynomial  representations 
respectively.  We  compute  the  polynomial  representation  of  S'  n  T  as: 

f  *  r  +  g  *  s 

where  r,  s  <—  [x] ,  where  R^[x]  is  the  set  of  all  polynomials  of  degree  0,  ...,6  with  coeffi¬ 
cients  chosen  independently  and  uniformly  from  R:  r  =  ®  ~  where 

Vo<i</3  r[i]  ^  R,  Vo<i</3  s[i]  ^  R. 

We  show  below  that  f*r  +  g*s  is  a  polynomial  representation  of  S'  n  T.  In  addition,  we  prove 
that,  given  /  *  r  -|-  *  s,  one  cannot  learn  more  information  about  S'  and  T  than  what  can  be 

deduced  from  5  n  T,  as  formally  stated  in  Theorem  3. 

First,  we  must  prove  the  following  lemma: 

Lemma  2.  Let  f,g  be  polynomials  in  i?[x]  where  R  is  a  ring,  deg(/)  =  deg(5)  =  a,  and 
gcd{f,g)  =  1.  Let  r  =  ®  where  Vo<i</3  r[i]  ^  R,  Vo<i</3  s[i]  ^  R 

(independently)  and  P  >  a. 

Let  u  =  f*r  +  g*s  =  XlSf  u[i]x'^ .  Then  Vo<i<a+/3  u[i]  are  distributed  uniformly  and  indepen¬ 
dently  over  R. 

We  give  a  proof  of  Lemma  2  in  Appendix  B. 

By  this  lemma,  f  *  r  g  *  s  =  gcd{f,g)  *  u,  where  u  is  distributed  uniformly  in  R'^[x]  for 
7  =  2deg(/)  —  |5  n  T|.  Note  that  a  is  a  root  of  gcd{f,g)  and  {x  —  a)^“  |  gcd{f,g)  if  and  only 
if  a  appears  ia  times  in  S'  n  T.  Moreover,  because  u  is  distributed  uniformly  in  R'^[x],  with 
overwhelming  probability  the  roots  of  u  do  not  represent  any  element  from  P  (as  explained  in  the 
beginning  of  Section  4).  Thus,  the  computed  polynomial  f*r-\-g*s  is  a  polynomial  representation 
of  S'  n  T.  Note  that  this  technique  for  computing  the  intersection  of  two  multisets  can  be  extended 
to  simultaneously  compute  the  intersection  of  an  arbitrary  number  of  multisets  in  a  similar  manner. 
Also,  given  f*r-\-g*s,  one  cannot  learn  more  information  about  S'  and  T  than  what  can  be  deduced 
from  S'  n  T,  as  formally  stated  in  the  following  theorem: 

Theorem  3.  Let  TTPl  he  a  trusted  third  party  whieh  reeeives  the  private  input  multiset  Si  from 
player  i  for  1  <  i  <  n,  and  then  returns  to  every  player  the  interseetion  multiset  Si  (1  ■  ■  ■  (1  Sn 
direetly.  Let  TTP2  he  another  trusted  third  party,  whieh  reeeives  the  private  input  multiset  Si  from 
player  i  for  1  <  i  <  n,  and  then:  (1)  ealeulates  the  polynomial  representation  fi  for  eaeh  Si;  (2) 
ehooses  ri  <—  Rf[x\;  (3)  eomputes  and  returns  to  eaeh  player  XlILi 

There  exists  a  PPT  translation  algorithm  sueh  that,  to  eaeh  player,  the  results  of  the  following 
two  seenarios  are  distributed  identieally:  (1)  applying  translation  to  the  output  of  TTPl;  (2) 
returning  the  output  of  TTP2  direetly. 

Proof  sketeh.  Let  the  output  of  TTPl  be  denoted  T.  The  translation  algorithm  operates  as  follows: 
(1)  calculates  the  polynomial  representation  g  oi  T]  (2)  chooses  the  random  polynomial  u  <— 
R^^~\T[x\]  (3)  computes  and  returns  g  *u.  □ 
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Element  Reduction.  We  define  the  operation  of  element  reduction  (by  d)  of  multiset  S  (denoted 
Rdd(5'))  as  follows:  for  each  element  a  that  appears  b  times  in  S,  it  appears  max{6  —  d,  0}  times  in 
the  resulting  multiset.  We  compute  the  polynomial  representation  of  Rdrf(5)  as: 

*F*r  +  f*s 

where  r,s  ^  and  F  is  any  polynomial  of  degree  d,  such  that  Vaep  F{a)  /  0.  Note  that 

a  random  polynomial  of  degree  d  in  ii[x]  has  this  property  with  overwhelming  probability. 

To  show  that  formal  derivative  operation  allows  element  reduction,  we  require  the  following 
lemma: 

Lemma  4.  Let  f  G  R[x],  where  R  is  a  ring,  d>l. 

1.  If  (x  —  I  f,  then  {x  —  a)  \ 

2.  If  {x  —  a)  I  /  and  {x  —  /  f,  then  {x  —  a)  / 

Lemma  4  is  a  standard  result  [27].  By  this  lemma  and  gcd{F,  f)  =  1,  an  element  a  is  a 
root  of  gcd(/('^),/)  and  {x  —  a)^“  |  gcd{f^'^\f)  if  and  only  if  a  appears  £a  times  in  Rdd{S).  By 
Lemma  2,  *F*r  +  f*s  =  gcd{f^^\  f)  *  u,  where  u  is  distributed  uniformly  in  R'^[x]  for 
7  =  2/?  —  |Rdrf(5)|.  Thus,  with  overwhelming  probability,  any  root  of  u  does  not  represent  any 
element  from  P.  Therefore,  ^T^r  +  Z^sisa  polynomial  representation  of  Rdd{S),  and 
moreover,  given  f^^^  *F*r  +  f*s,  one  cannot  learn  more  information  about  S  than  what  can  be 
deduced  from  Rdrf(5'),  as  formally  stated  in  the  following  theorem: 

Theorem  5.  Let  F  be  a  publiely  known  polynomial  of  degree  d  sueh  that  F{cl)  /  0.  Let  TTPl 
be  a  trusted  third  party  whieh  reeeives  a  private  input  multiset  S,  and  then  returns  the  reduetion 
multiset  Rdd{S)  direetly.  Let  TTP2  he  another  trusted  third  party,  whieh  reeeives  a  private  input 
multiset  S,  and  then:  (1)  ealeulates  the  polynomial  representation  f  of  S;  (2)  ehooses  r,s  ^  R^[x\; 
(3)  eomputes  and  returns  f^^'i  *F*r  +  f*s. 

There  exists  a  PPT  translation  algorithm  sueh  that  the  results  of  the  following  two  seenarios  are 
distributed  identieally:  (1)  applying  translation  to  the  output  o/TTPl;  (2)  returning  the  output  of 
TTP2  direetly. 

Proof  sketeh.  Let  the  output  of  TTPl  be  denoted  T.  The  translation  algorithm  operates  as  follows: 
(1)  calculates  the  polynomial  representation  g  oi  T]  (2)  chooses  the  random  polynomial  u  <— 
(3)  computes  and  returns  g  *  u.  □ 

4.2.2  Operations  with  Encrypted  Polynomials 

In  the  previous  section,  we  prove  the  security  of  our  polynomial-based  multiset  operators  when  the 
polynomial  representation  of  the  result  is  computed  by  a  trusted  third  party  (TTP2).  By  using 
additively  homomorphic  encryption,  we  allow  these  results  to  be  implemented  as  protocols  in  the 
real  world  without  a  trusted  third  party  (i.e.,  the  polynomial  representation  of  the  set  operations  is 
computed  by  the  parties  collectively  without  a  trusted  third  party).  In  the  algorithms  given  above, 
there  are  three  basic  polynomial  operations  that  are  used:  addition,  multiplication,  and  the  formal 
derivative.  We  give  algorithms  in  this  section  for  computation  of  these  operations  with  encrypted 
polynomials. 

For  /  G  R[x\,  we  represent  the  eneryption  of  polynomial  /,  Epk{f),  as  the  ordered 
list  of  the  encryptions  of  its  coefficients  under  the  additively  homomorphic  cryptosystem: 


Epk{f[0]),.--,Epk{f[deg{f)]).  Let  /i,  /2,  and  g  be  polynomials  in  R[x]  such  that  /i(x)  = 

f2{x)  =  f2[i]x\  and  g{x)  =  9[i]x"-  Let  a,b  e  R.  Using  the  ho¬ 

momorphic  properties  of  the  homomorphic  cryptosystem,  we  can  efficiently  perform  the  following 
operations  on  encrypted  polynomials  without  knowledge  of  the  private  key: 

•  Sum  of  encrypted  polynomials:  given  the  encryptions  of  the  polynomial  fi  and  /2,  we  can 
efficiently  compute  the  encryption  of  the  polynomial  :=  /i  +  /2,  by  calculating  Epk{g[i])  := 
Epk{fi[i])  +h  Epk{f2[i])  {0  <i<  max{deg(/i),deg(/2)}) 

•  Product  of  an  unencrypted  polynomial  and  an  encrypted  polynomial:  given  a  polynomial  /2 
and  the  encryption  of  polynomial  /i ,  we  can  efficiently  compute  the  encryption  of  polynomial 
9  ■=  fi  *  f2,  (also  denoted  /2  *h  -Upfc(/i))  by  calculating  the  encryption  of  each  coefficient 
Epk{9[^)  :=  (/2[0]  Xh  Epkifim  +h  (/2[1]  X;,  Epk{fl[i  -  1]))  +h  +h  (/2H  X;,  Epk{h[0])) 
(0  <  i  <  deg(/i)  -h  deg(/2))- 

•  Derivative  of  an  encrypted  polynomial:  given  the  encryption  of  polynomial  /i,  we  can  ef¬ 
ficiently  compute  the  encryption  of  polynomial  g  :=  ^/i,  by  calculating  the  encryption  of 
each  coefficient  Epk{g[i])  :=  (i  -h  1)  Xh  Epk{fi[i  -hi])  (0  <  i  <  deg(/i)  -  1). 

•  Evaluation  of  an  encrypted  polynomial  at  an  unencrypted  point:  given  the  encryption  of 

polynomial  /i,  we  can  efficiently  compute  the  encryption  of  a  :=  /i(6),  by  calculating 
Epk{a)  :=  (6°  X;,  Epk{h[0]))  +h  {b^  x^  Epk{fi[l]))  +h  ■  ■  ■  +h  Epk{fi[deg{fi)])). 

It  is  easy  to  see  that  with  the  above  operations  on  encrypted  polynomials,  we  can  allow  the 
computation  of  the  polynomial  representations  of  set  operations  described  in  Section  4.2.1  without 
the  trusted  third  party  (TTP2)  while  enjoying  the  same  security.  We  demonstrate  this  property 
with  concrete  examples  detailed  in  the  remainder  of  this  paper. 

4.3  Overview  of  Applications 

The  techniques  we  introduce  for  privacy-preserving  computations  of  multiset  operations  have  many 
applications.  We  give  several  concrete  examples  that  utilize  our  techniques  for  specific  privacy¬ 
preserving  functions  on  multisets  in  the  following  sections. 

First,  we  design  efficient  protocols  for  the  Set-Intersection  and  Cardinality  Set-Intersection 
problems,  secure  against  honest-but-curious  adversaries  (Section  5).  We  then  provide  an  efficient 
protocol  for  the  Over-Threshold  Set-Union  problem,  as  well  as  three  variants  of  the  Threshold  Set- 
Union  problem,  secure  against  honest-but-curious  adversaries,  in  Section  6.  We  introduce  tools  and 
protocols,  secure  against  malicious  players,  for  the  Set-Intersection,  Cardinality  Set-Intersection, 
and  Over-Threshold  Set-Union  problems  in  Section  7.  We  propose  an  efficient  protocol  for  the 
Subset  problem  in  Section  8.2. 

More  generally,  our  techniques  allow  private  computation  of  functions  based  on  composition  of 
the  union,  intersection,  and  element  reduction  operators.  We  discuss  techniques  for  this  general 
private  computation  on  multisets  in  Section  8.1. 

Our  techniques  are  widely  applicable,  even  outside  the  realm  of  computation  of  functions  over 
multisets.  As  an  example,  we  show  how  to  apply  our  techniques  to  private  evaluation  of  boolean 
formulas  in  CNF  form  in  Section  8.3. 
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Protocol:  Set-Intersection-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with 
a  private  input  set  Si,  such  that  |S'i|  =  k.  The  players  share  the  secret  key  sk,  to  which  pk 
is  the  corresponding  public  key  to  a  homomorpic  cryptosystem. 

1 .  Each  player  i  =  1, . . .  ,n 

(a)  calculates  the  polynomial  fi  =  {x—  (S'i)i) . .  .{x  —  {Si)k) 

(b)  sends  the  encryption  of  the  polynomial  fi  to  players  i  +  1, . . . ,  i  +  c 

(c)  chooses  c  +  1  polynomials  rip, . . . ,  Vi^c  ^  R'^lx] 

(d)  calculates  the  encryption  of  the  polynomial  4>i  =  +  -  •  +  + 

fi  *  utilizing  the  algorithms  given  in  Sec.  4.2.2. 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  (j)i,  to  player  2 

3.  Each  player  t  =  2, . . . ,  n  in  turn 

(a)  receives  the  encryption  of  the  polynomial  Ai_i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  A^  =  Ai_i  +  (fi  by  utilizing  the 
algorithms  given  in  Sec.  4.2.2. 

(c)  sends  the  encryption  of  the  polynomial  Xi  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  Xn  =  /*  * 

to  all  other  players. 

5.  All  players  perform  a  group  decryption  to  obtain  the  polynomial  p. 

Each  player  i  =  I, . . .  ,n  determines  the  intersection  multiset  as  follows:  for  each  a  €  Si,  he 
calculates  b  such  that  {x  —  a)^\p  A  {x  —  J{p.  The  element  a  appears  b  times  in  the 

intersection  multiset. 


Figure  1:  Set-Intersection  protocol  for  the  honest-but-curious  case. 


5  Application  I:  Private  Set-Intersection  and  Cardinality  Set- 
Intersection 

In  this  section,  we  design  protocols  for  Set-Intersection  and  Cardinality  Set-Intersection,  secure 
against  a  coalition  of  honest-but-curious  adversaries. 

5.1  Set-Intersection 

Problem  Definition.  Let  there  be  n  parties;  each  has  a  private  input  set  S'*  (1  <  i  <  n)  of  size 
k.  We  define  the  Set- Intersection  problem  as  follows:  all  players  learn  the  intersection  of  all  private 
input  multisets  without  gaining  any  other  information;  that  is,  each  player  learns  5i  n  52  n  •  •  •  n  5^. 

Our  protocol  for  the  honest-but-curious  case  is  given  in  Fig.  1.  In  this  protocol,  each  player  i 
(1  <  z  <  n)  first  calculates  a  polynomial  representation  fi  G  R[x]  of  his  input  multiset  5j.  He  then 
encrypts  this  polynomial  fi,  and  sends  it  to  c  other  players  z  -|-  1, . . . ,  i  -|-  c.  For  each  encrypted 
polynomial  Epf^{fi),  each  player  i  j  {0  <  j  <  c)  chooses  a  random  polynomial  Vi^jj  G  R^[x]. 
Note  that  at  most  c  players  may  collude,  thus  Yl'j=o  ^i+j,j  both  uniformly  distributed  and  known 

to  no  player.  They  then  compute  the  encrypted  polynomial  *h  Epk{fi).  From 

these  encrypted  polynomials,  the  players  compute  the  encryption  of  p  =  fi  *  f  • 
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All  players  engage  in  group  decryption  to  obtain  the  polynomial  p.  Thus,  by  Theorem  3,  the 
players  have  privately  computed  p,  a  polynomial  representing  the  intersection  of  their  private  input 
multisets.  Finally,  to  reconstruct  the  multiset  represented  by  polynomial  p,  the  player  i,  for  each 
a  ^  Si,  calculates  b  such  that  {x  —  a)^\p  A  {x  —  J(p.  The  element  a  appears  b  times  in  the 

intersection  multiset. 

Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  appropriate 
answer  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player  gains 
information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model.  A  formal  statement  of 
these  properties  is  as  follows: 

Theorem  6.  In  the  Set-Intersection  protocol  of  Fig.  1,  every  player  learns  the  intersection  of  all 
players’  private  inputs,  Si  n  52  n  •  •  •  n  Sn,  with  overwhelming  probability. 

Theorem  7.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  seman¬ 
tically  secure,  with  overwhelming  probability,  in  the  Set-Intersection  protocol  of  Fig.  1,  any  coalition 
of  fewer  than  n  PPT  honest-but-curious  players  learns  no  more  information  than  would  be  gained 
by  using  the  same  private  inputs  in  the  ideal  model  with  a  trusted  third  party. 

We  provide  proof  sketches  for  Theorems  6  and  7  in  Appendix  C.l. 

5.2  Cardinality  Set-Intersection 

Problem  Definition.  We  define  the  Cardinality  Set-Intersection  problem  on  sets  as  follows:  each 
player  learns  the  number  of  unique  elements  in  5in  -  •  -n^n,  without  learning  any  other  information. 
A  variant  of  this  problem  is  the  Cardinality  Set-Intersection  problem  on  multisets,  which  we  define 
as  follows:  all  players  learn  |5i  n  •  •  •  n  5n|,  as  computed  on  multisets. 

Our  protocol  for  Cardinality  Set-Intersection,  given  in  Figure  2,  proceeds  as  our  protocol  for 
Set-Intersection,  until  the  point  where  all  players  learn  the  encryption  of  p,  the  polynomial  repre¬ 
sentation  of  5i  n  •  •  •  n  5^.  Each  player  i  =  1, . . . ,  n  then  evaluates  this  encrypted  polynomial  at 
each  unique  element  a  G  Si,  obtaining  Pa,  an  encryption  of  p{a).  He  then  blinds  each  encrypted 
evaluation  p(a)  by  calculating  P'a  =  ba  Xh  Pa-  All  players  then  distribute  and  shuffle  the  ciphertexts 
Pa  constructed  by  each  player,  such  that  all  players  receive  all  ciphertexts,  without  learning  their 
source.  The  Shuffle  protocol  can  be  easily  accomplished  with  standard  techniques  [5,  18,  9,  14,  25], 
with  communication  complexity  at  most  OprPk).  The  players  then  decrypt  these  ciphertexts,  find¬ 
ing  that  nb  of  the  decryptions  are  0,  implying  that  there  are  b  unique  elements  in  5i  n  •  •  •  n  5n. 
FNP  utilize  a  variation  of  this  technique  [13],  but  it  is  not  obvious  how  to  construct  a  multiparty 
Cardinality  Set-Intersection  protocol  from  their  techniques. 

Variants.  Our  protocol  can  be  simply  extended  to  privately  compute  the  Cardinality  Set- 
Intersection  problem  on  multisets,  by  utilizing  an  encoding  as  follows:  any  element  a  that  appears  b 
times  in  a  multiset  is  encoded  as  the  set:  {a  j  j  1, . . . ,  a  j  j  6},  with  element  included  only  once.  Note 
that  this  is  a  set  of  equivalent  size  as  the  original  multiset  representation,  so  this  variant  preserves 
the  efficiency  of  our  protocol. 
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Protocol:  Cardinality-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with 
a  private  input  set  Si,  such  that  15^1  =  k.  The  players  share  the  secret  key  sk,  to  which  pk 
is  the  corresponding  public  key  to  a  homomorpic  cryptosystem. 

1.  Each  player  i  =  1, . . .  ,n 

(a)  calculates  the  polynomial  fi  =  {x  —  (S'i)i) . . .  (x  —  {Si)k) 

(b)  sends  the  encryption  of  the  polynomial  fi  to  players  i  +  1, . . .  ,i  +  c 

(c)  chooses  c  +  1  random  polynomials  Vi^o, . . . ,  Vi^c  ^ 

(d)  calculates  the  encryption  of  the  polynomial  =  fi-c*Ti^i-c  +  ‘  ■  ■  +  + 

fi  *  Xifi,  utilizing  the  algorithms  given  in  Sec.  4.2.2. 

2.  Player  1  sends  the  encrypted  polynomial  Ai  =  (j^i,  to  player  2 

3.  Each  player  i  =  2, . . .  ,n  in  turn 

(a)  receives  the  encryption  of  the  polynomial  Ai_i  from  player  i  —  I 

(b)  calculates  the  encryption  of  the  polynomial  A^  =  Ai_i  +  by  utilizing  the 
algorithms  given  in  Sec.  4.2.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  z  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  Xn  =  /*  * 

(X) j=o ''’i+i J  )  other  players. 

5.  Each  player  i  =  1, ...  ,n 

(a)  evaluates  the  encryption  of  the  polynomial  p  at  each  input  {Si)j,  obtaining 
encrypted  elements  Epk{cij)  where  =  p{{Si)j),  using  the  algorithm  given  in 
Sec.  4.2.2. 

(b)  for  each  j  =  1 , . . . ,  fc  chooses  a  random  number  rij  <—  R  and  calculates  an 
encrypted  element  (Vi)j  =  nj  Xh  Epk{cij) 

6.  All  players  perform  the  Shuffle  protocol  on  their  private  input  sets  Vi,  obtaining  a 
joint  set  V ,  in  which  all  ciphertexts  have  been  re-randomized. 

7.  All  players  1, . . .  n  decrypt  each  element  of  the  shuffled  set  V 

If  nb  of  the  decrypted  elements  from  V  are  0,  then  the  size  of  the  set  intersection  is  b. 


Figure  2:  Cardinality  set-intersection  protocol  for  the  honest-but-curious  case. 

Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  size  of  the 
answer  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player  gains 
information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model.  A  formal  statement  of 
these  properties  is  as  follows: 

Theorem  8.  In  the  Cardinality  Set-Intersection  protocol  of  Fig.  2,  every  player  learns  the  size  of 
the  intersection  of  all  players’  private  inputs,  |5i  n  S'2  n  •  •  •  H  5n|,  with  overwhelming  probability. 

Theorem  9.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  semanti¬ 
cally  secure  and  that  the  Shuffle  protocol  is  secure,  with  overwhelming  probability,  in  the  Cardinality 
Set-Intersection  protocol  of  Fig.  2,  any  coalition  of  fewer  than  n  PPT  honest-but-curious  players 
learns  no  more  information  than  would  be  gained  by  using  the  same  private  inputs  in  the  ideal 
model  with  a  trusted  third  party. 

We  provide  proof  sketches  for  Theorems  8  and  9  in  Appendix  C.2. 
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5.3  Malicious  Case. 


We  can  extend  our  protocols  in  Figures  1  and  2,  secure  against  honest-but-curious  players,  to 
protocols  secure  against  malicious  adversaries  by  adding  zero-knowledge  proofs  or  using  cut-and- 
choose  to  ensure  security.  We  give  details  of  our  protocols  secure  against  malicious  adversaries 
in  Section  7.2.  We  prove  security  against  malicious  parties  for  these  protocols  in  Appendices  C.l 
and  C.2. 

6  Application  II:  Private  Over-Threshold  Set-Union  and  Thresh¬ 
old  Set-Union 

In  this  section,  we  design  protocols  for  the  Over-Threshold  Set-Union  problem  and  several  variations 
of  the  Threshold  Set-Union  problem,  secure  against  a  coalition  of  honest-but-curious  adversaries. 

6.1  Over-Threshold  Set-Union  Protocol 

Problem  Definition.  Let  there  be  n  players;  each  has  a  private  input  set  Si  {1  <  i  <  n)  of  size 
k.  We  define  the  Over- Threshold  Set-Union  problem  as  follows:  all  players  learn  which  elements 
appear  in  the  union  of  the  players’  private  input  multisets  at  least  a  threshold  number  t  times, 
and  the  number  of  times  these  elements  appeared  in  the  union  of  players’  private  inputs,  without 
gaining  any  other  information.  For  example,  assume  that  a  appears  in  the  combined  private  input 
of  the  players  15  times.  If  t  =  10,  then  all  players  learn  a  has  appeared  15  times.  However,  if  t  =  16, 
then  no  player  learns  a  appears  in  any  player’s  private  input.  This  problem  can  be  computed  as 
Rdt_i(5iU---U5„). 

We  describe  our  protocol  secure  against  honest-but-curious  players  for  the  Over-Threshold  Set- 
Union  problem  in  Fig.  3.  In  this  protocol,  each  player  i  (1  <  i  <  n)  first  calculates  fi,  the 
polynomial  representation  of  its  input  multiset  Si.  All  players  then  compute  the  encryption  of 
polynomial  p  =  OILi  /o  polynomial  representation  of  5i  U  •  •  •  U  Players  i  =  1, . . . ,  c  -|-  1 
then  each  chooses  random  polynomials  r*,  Sj,  and  calculates  the  encryption  of  the  polynomial  F  * 
pd-i)  ^  .j...  p  ^  g.  Qg  shown  in  Fig.  3.  All  players  then  calculate  the  encryption  of  the  polynomial 
=  F  ri'j  -\-p*  Si^  and  perform  a  group  decryption  to  obtain  d>.  As  at  most 

c  players  may  dishonestly  collude,  the  polynomials  r*,  uniformly  distributed  and 

known  to  no  player.  By  Theorem  5,  is  a  polynomial  representation  of  Rdt_i(Si  U  •  •  •  U  Sn). 

Each  player  i  =  l,...,n  then  chooses  bij  ^  R  and  computes  Uij  =  x  <L((5j)j)  -|-  {Si)j 
(1  <  j  <  k).  Each  element  Uij  equals  {Si)j  if  {Si)j  £  Rdt_i(Si  U  •••  U  Sn),  and  is  otherwise 
uniformly  distributed  over  R.  The  players  then  shuffle  these  elements  Uij,  such  that  each  player 
learns  all  of  the  elements,  but  does  not  learn  which  player’s  set  they  came  from.  The  shuffle  can 
be  easily  accomplished  with  standard  techniques  [5,  18,  9,  14,  25],  with  communication  complexity 
at  most  0{n^k).  The  multiset  formed  by  those  shuffled  elements  that  represent  elements  of  P  is 
Rdt_i(FiU---U5„). 


Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  appropriate 
answer  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player  gains 
information  that  it  would  not  gain  when  using  its  input  in  the  ideal  model  with  a  trusted  third 
party.  A  formal  statement  of  these  properties  is  as  follows: 
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Protocol:  Over-Threshold  Set-Union-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with 
a  private  input  set  Si,  such  that  |S'i|  =  k.  The  players  share  the  secret  key  sk,  to  which  pk 
is  the  corresponding  public  key  for  a  homomorphic  cryptosystem.  The  threshold  number 
of  repetitions  at  which  an  element  appears  in  the  output  is  t.  T  is  a  fixed  polynomial  of 
degree  t  —  1  which  has  no  roots  representing  elements  of  P. 

1.  Each  player  i  =  1, . . .  ,n  calculates  the  polynomial  fi  =  {x—  (<S'i)i) . .  .{x  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  /i  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai_i  from  player  i  —  1 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algo¬ 
rithm  given  in  Sec.  4.2.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  Xn  =  Uti  fi  to  players 
2, . . . ,  c  -|-  1 

5.  Each  player  i  =  1, . . . ,  c  -I-  1 

(a)  calculates  the  encryption  of  the  t  —  1th  derivative  of  p,  denoted  by  re¬ 

peating  the  algorithm  given  in  Sec.  4.2.2. 

(b)  chooses  random  polynomials  ri,Si  ^ 

(c)  calculates  the  encryption  of  the  polynomial  p  *  Si  +  F  *  p^*~^'>  *  Vi  and  sends  it 
to  all  other  players. 

6.  All  players  perform  a  group  decryption  to  obtain  the  polynomial  4)  =  E  *  * 

7.  Each  player  i  =  1, . . . ,  n,  for  each  j  =  1, . . . ,  k 

(a)  chooses  a  random  element  bij  <—  R 

(b)  calculates  Uij  =  bij  x  <i)((S'i)j)  -I-  {Si)j 

8.  All  players  i  =  1, . . .  n  perform  the  Shuffle  protocol  on  the  elements  Uij  (1  <  j  <  k), 
such  that  each  player  obtains  a  joint  set  V. 

Each  element  a  G  P  that  appears  b  times  in  V  is  an  element  in  the  threshold  set  that 
appears  b  times  in  the  players’  private  inputs. 


Figure  3:  Over-Threshold  Set-Union  protocol  for  the  honest-but-curious  case. 


Theorem  10.  In  the  Over- Threshold  Set-Union  protoeol  of  Fig.  3,  every  honest-hut- eurious  player 
learns  eaeh  element  a  whieh  appears  at  least  t  times  in  the  union  of  the  n  players’  private  inputs, 
as  well  as  the  number  of  times  it  so  appears,  with  overwhelming  probability. 

Theorem  11.  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is  seman- 
tieally  seeure,  with  overwhelming  probability,  in  the  Over-Threshold  Set-Union  protoeol  of  Fig.  3, 
any  eoalition  of  fewer  than  n  PPT  honest-hut- eurious  players  learns  no  more  information  than 
would  be  gained  by  using  the  same  private  inputs  in  the  ideal  model  with  a  trusted  third  party. 

We  provide  proof  sketches  for  Theorems  10  and  11  in  Appendix  D.l. 
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6.2  Threshold  Set-Union 


Problem  Definition.  We  define  the  Threshold  Set-Union  problem  as  follows:  all  players  learn 
which  elements  appear  in  the  combined  private  input  of  the  players  at  least  a  threshold  number  t 
times.  For  example,  assume  that  a  appears  in  the  combined  private  input  of  the  players  15  times. 
If  f  =  10,  then  all  players  learn  a.  However,  if  t  =  16,  then  no  player  learns  a.  This  problem 
differs  from  the  Over-Threshold  Set-Union  problem  in  that  each  player  learns  the  elements  of 
Rdt_i(5i  n  •  •  •  n  5n),  without  learning  how  often  each  element  appears. 

We  offer  protocols  for  several  variants  on  Threshold  Set-Union:  threshold  contribution,  perfect, 
and  semi-perfect.  Threshold  contribution  allows  for  thresholds  t  >  1,  and  each  player  learns  only 
those  elements  which  appear  both  in  his  private  input  and  the  threshold  set:  player  i  (1  <  i  <  n) 
learns  the  elements  of  5inRdt_i(5in-  •  -riSn)-  Perfect  threshold  set-intersection  allows  for  thresholds 
t  >  1,  and  conforms  exactly  to  the  definition  of  threshold  set-intersection.  The  semi-perfect  variant 
requires  for  security  that  t  >2,  and  that  the  cheating  coalition  does  not  include  any  single  element 
more  than  t  —  1  times  in  their  private  inputs.  Note  that  the  information  illicitly  gained  by  the 
coalition  when  they  include  more  than  t  —  1  copies  of  an  element  a  is  restricted  to  a  possibility  of 
learning  that  there  exists  some  other  player  whose  private  input  contains  a.  We  do  not  consider 
the  difference  in  security  between  the  semi-perfect  and  perfect  variants  to  be  significant. 

The  protocols  for  the  Threshold  Set-Union  problem,  given  in  Figs.  4,  5,  and  6,  are  identical  to  the 
protocol  for  Over-Threshold  Set-Union  (given  in  Fig.  3)  from  step  1-5.  We  explain  the  differences 
between  the  protocols  for  each  variant:  threshold  contribution,  semi-perfect,  and  perfect.  Each 
player  constructs  encryptions  of  the  elements  4>((S'i)j)  from  his  private  input  set  in  step  6,  and 
continues  as  described  below. 

Threshold  Contribution  Threshold  Set-Union.  This  protocol  is  given  in  Fig.  5.  The  play¬ 
ers  cooperatively  decrypt  the  encrypted  elements  4>((S'j)j)  *  This  decryption  must 

take  place  in  such  a  way  that  only  player  i  learns  the  element  4>((S'j)j)  *  Typi¬ 

cally,  parties  produce  decryption  shares  and  reconstruct  the  element  from  them;  player  i  sim¬ 
ply  retains  his  decryption  share,  so  that  only  he  learns  the  decryption.  Thus  each  player  learns 
which  of  his  elements  appear  in  the  threshold  set,  since  if  {Si)j  appears  in  the  threshold  set, 
‘h(('S'j)j)  *  —  0-  player  learns  more  information  because  if  an  element  {Si)j  is  not 

in  the  threshold  set,  4>((5i)j)  *  is  uniformly  distributed. 

Semi-Perfect  Threshold  Set-Union.  This  protocol  is  given  in  Fig.  4.  The  encrypted  element 
{Ui)j  calculated  from  the  encrypted  evaluation  of  <h((5i)j)  is  either:  (1)  an  encryption  of  the  private 
input  element  {Si)j  (if  {Si)j  is  in  the  intersection  set)  or  (2)  an  encryption  of  a  random  element 
(otherwise).  However,  the  player  also  constructs  a  corresponding  encrypted  tag  for  each  {Ui)j,  Tij. 
We  require  that  the  cryptosystem  used  to  construct  these  tags  be  key-private,  so  that  the  origin  of 
ciphertext  pairs  T,  U  cannot  be  ascertained  by  the  key  used  to  construct  the  tags. 

The  players  then  correctly  obtain  a  decryption  of  each  element  in  the  threshold  set  exactly  once. 
Any  other  time  a  ciphertext  U  for  an  element  in  the  threshold  set  is  decrypted,  a  player  sabotages 
it.  In  group  decryption  schemes,  players  generally  produce  shares  of  the  decrypted  element;  if  one 
player  sends  a  uniformly  generated  share  instead  of  a  valid  one,  the  decrypted  element  is  uniform.  If 
the  decrypted  element  is  uniform,  it  conveys  no  information  to  the  players.  To  ensure  an  encryption 
of  an  element  in  the  threshold  set  is  not  decrypted  once  the  element  is  known  to  be  in  the  threshold 
set,  a  player  sabotages  the  decryption  under  the  following  conditions:  (1)  he  can  decrypt  the  tag 
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Protocol:  Threshold-SemiPerfect-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with 
a  private  input  set  Si,  such  that  15^1  =  k.  The  players  share  the  secret  key  sk,  to  which  pk 
is  the  corresponding  public  key  for  a  homomorphic  cryptosystem.  The  threshold  number 
of  repetitions  at  which  an  element  appears  in  the  output  is  t.  f  is  a  fixed  polynomial  of 
degree  t  —  1  which  has  no  roots  representing  elements  of  P. 

1.  Each  player  i  =  1, . . .  ,n  calculates  the  polynomial  fi  =  {x  —  (>S'i)i) . .  .{x  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  /i  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai_i  from  player  i  —  I 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algo¬ 
rithm  given  in  Sec.  4.2.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  A„  =  nr=i  /j  to  players 
2, . . . ,  c  -t-  1 

5.  Each  player  t  =  1, . . . ,  c  -I-  1 

(a)  calculate  the  encryption  of  the  t— 1th  derivative  of  p,  denoted  by  repeating 

the  algorithm  given  in  Sec.  4.2.2. 

(b)  choose  random  polynomials  Ti,  Si  ^ 

(c)  calculate  the  encryption  of  the  polynomial  p*ri  +  F  *  Si  and  send  it  to 

all  other  players 

6.  Each  player  i  =  1, . . .  ,n 

(a)  evaluates  the  encryption  of  the  polynomial  <I)  =  p  *  +  F  *  * 

(X^iii each  input  {Si)j,  obtaining  encrypted  elements  Epkicij)  where 
Cij  =  using  the  algorithm  given  in  Sec.  4.2.2. 

(b)  for  each  j  =  1, . . . ,  fc  calculates  an  encrypted  tag  Ty  =  EnCi{h{{Si)j)  ||  {Si)j) 

(c)  for  each  j  =  l,...,k  chooses  a  random  number  rij  <—  R  and  calculates  an 
encrypted  element  Uij  =  {nj  Xh  Epk{cij))  +h  Epk{{Si)j) 

(d)  constructs  the  set  Vi  =  {{Tij  ||  Uij)  \  I  <  j  <  k} 

7.  By  using  the  Shuffle  protocol,  players  perform  shuffling  on  their  private  input  sets  Vi. 

8.  For  each  shuffled  element  T  1 1  {7  in  sorted  order,  each  player  i  =  1, ...  ,n 

(a)  if  Di{T)  =  h{a)  ||  a  for  some  a 

i.  if  a  has  previously  been  revealed  to  be  in  the  threshold  set,  then  calculate 
an  incorrect  decryption  share  of  U,  and  send  it  to  all  other  players 

(b)  else  calculate  a  decryption  share  of  U,  and  send  it  to  all  other  players 

(c)  reconstruct  the  decryption  of  U.  If  the  element  a  G  P,  then  a  is  in  the  threshold 
result  set 


Figure  4:  Threshold  Set-Union  protocol  for  the  honest-but-curious  case  (semi-perfect  variant). 


to  h{a)  II  a  for  some  a  and  (2)  a  has  already  been  determined  to  be  a  member  of  the  threshold  set. 
All  other  ciphertexts  should  be  correctly  decrypted;  either  they  are  encryptions  of  elements  in  the 
threshold  set  which  have  not  yet  been  decrypted,  or  they  are  encryptions  of  random  elements. 
Note  that  the  protocol  is  the  only  protocol  proposed  in  this  paper  with  a  non-constant  number 
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Protocol:  Threshold-Contribution-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with 
a  private  input  set  Si,  such  that  15^1  =  k.  The  players  share  the  secret  key  sk,  to  which  pk 
is  the  corresponding  public  key  for  a  homomorphic  cryptosystem.  The  threshold  number  of 
repetitions  at  which  an  element  appears  in  the  output  is  t.  F  is  a,  fixed  polynomial  of  degree 
t  —  I  which  has  no  roots  representing  elements  of  P.  The  threshold  number  of  repetitions 
at  which  an  element  appears  in  the  output  is  t  >  2.  T’  is  a  fixed  polynomial  of  degree  t  —  1 
which  has  no  roots  representing  elements  of  P. 

1.  Each  player  i  =  1, . . . ,  n  calculates  the  polynomial  fi  =  {x  —  (S'i)i) . .  .{x  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  fi  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai_i  from  player  i  —  I 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algo¬ 
rithm  given  in  Sec.  4.2.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  A„  =  ut  to  players 
2, . . . ,  c  -I-  1 

5.  Each  player  t  =  1, . . . ,  c  -I-  1 

(a)  calculate  the  encryption  of  the  t— 1th  derivative  of  p,  denoted  by  repeating 

the  algorithm  given  in  Sec.  4.2.2. 

(b)  choose  random  polynomials  ri,Si  <— 

(c)  calculate  the  encryption  of  the  polynomial  p*ri  +  F  *  Si  and  send  it  to 

all  other  players 

6.  Each  player  i  =  1, . . . ,  n 

(a)  evaluates  the  encryption  of  the  polynomial  <I)  =  p  *  ^X^i^i  +  F  *  * 

(X^iii each  input  {Si)j,  obtaining  encrypted  elements  Epkicij)  where 
Cij  =  ^{{Si)j),  using  the  algorithm  given  in  Sec.  4.2.2. 

(b)  sends  the  ciphertexts  (1  <  j  <  fc)  to  all  other  players 

(c)  chooses  a  random  element  bi^^i  (1  <  j  <  n,  1  <  ^  <  fc) 

(d)  for  each  ciphertext  Cji,  calculate  bij^i  X/i  (1  <  j  <  n,  1  <  t'  <  k) 

7.  The  players  t  (1  <  i  <  n)  calculate  Ujm  =  '^h  cjm  (1  <  J  <  n, 

I  <m  <  k) 

8.  All  players  decrypt  the  ciphertexts  Uij,  so  that  only  player  i  learns  the  decryption 

dij. 

For  each  player  i  {I  <  i  <  n),  if  Uij  =  0  (1  <  j  <  k),  then  {Si)j  is  in  his  result  set. 


Figure  5:  Threshold  Set-Union  protocol  for  the  honest-but-curious  case  (threshold  contribution 
variant). 

of  rounds.  Because  of  the  need  to  sabotage  decryptions  based  on  the  results  of  past  decryptions, 
there  are  0{nk)  rounds  in  this  protocol. 

Perfect  Threshold  Set-Union.  This  protocol  is  given  in  Fig.  6.  Each  player  constructs  the 
encrypted  elements  {Ui)j  from  the  encrypted  evaluation  of  <h((5j)j)  as  written  in  step  6  of  Figure  4. 
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Protocol:  Threshold-Perfect-HBC 

Input:  There  are  n  >  2  honest-but-curious  players,  c  <  n  dishonestly  colluding,  each  with 
a  private  input  set  Si,  such  that  15^1  =  k.  The  players  share  the  secret  key  sk,  to  which  pk 
is  the  corresponding  public  key  for  a  homomorphic  cryptosystem.  The  threshold  number  of 
repetitions  at  which  an  element  appears  in  the  output  is  t.  F  is  a,  fixed  polynomial  of  degree 
t  —  I  which  has  no  roots  representing  elements  of  P.  The  threshold  number  of  repetitions 
at  which  an  element  appears  in  the  output  is  t  >  2.  T’  is  a  fixed  polynomial  of  degree  t  —  1 
which  has  no  roots  representing  elements  of  P.  IsEq(C',  C")  =  1  if  the  ciphertexts  C,  C 
encode  the  same  plaintext,  and  0  otherwise. 

1.  Each  player  i=  1, . . . ,  n  calculates  the  polynomial  fi  =  {x  —  (S'i)i)  ■  ■  ■  (x  —  {Si)k) 

2.  Player  1  sends  the  encryption  of  the  polynomial  Ai  =  fi  to  player  2 

3.  Each  player  i  =  2, . . .  ,n 

(a)  receives  the  encryption  of  the  polynomial  Ai_i  from  player  i  —  I 

(b)  calculates  the  encryption  of  the  polynomial  Ai  =  Ai_i  *  fi  by  utilizing  the  algo¬ 
rithm  given  in  Sec.  4.2.2. 

(c)  sends  the  encryption  of  the  polynomial  Ai  to  player  i  +  1  mod  n 

4.  Player  1  distributes  the  encryption  of  the  polynomial  p  =  A„  =  n”  _-^fi  to  players 
2, . . . ,  c  -t-  1 

5.  Each  player  t  =  1, . . . ,  c  -I-  1 

(a)  calculate  the  encryption  of  the  f— 1th  derivative  of  p,  denoted  by  repeating 

the  algorithm  given  in  Sec.  4.2.2. 

(b)  choose  random  polynomials  Ti,  Si  ^ 

(c)  calculate  the  encryption  of  the  polynomial  p*ri  +  F  *  Si  and  send  it  to 

all  other  players 

6.  Each  player  i  =  1, . . . ,  n 

(a)  evaluates  the  encryption  of  the  polynomial  <1)  =  p  *  +  F  *  * 

(X^iii  Sj)  each  input  {Si)j,  obtaining  encrypted  elements  Epk{cij)  where 
Cij  =  <i)((S'i)j),  using  the  algorithm  given  in  Sec.  4.2.2,  and  sends  them  to  all 
players 

(b)  for  each  f  =  1, ...  ,n,  j  =  1, ...  ,k  chooses  a  random  number  ri>j  ^  R  and 
calculates  an  encrypted  element  Uij  =  {vi/j  Xh  Epk{ci'j)),  and  sends  it  to  player 
i' 

(c)  calculates  the  elements  for  j  =  1, . . . ,  A: 

Eij  =  ix\j  Xh  Fph{cij^^  h  F h  Epki^nj^')  Fh  Eph{{Si)j') 

(d)  constructs  the  set  Vi  =  {Uij  \  I  <  j  <  k} 

7.  By  using  the  Shuffle  protocol,  all  players  perform  shuffling  on  their  private  input  sets 
Vi,  obtaining  the  set  U' . 

8.  For  each  shuffled  ciphertext  U'^  with  arbitrary  ordering  index  i  G  [nk],  the  players 
i  =  1, . . .  ,n 

(a)  each  player  i  chooses  random  elements  qi^i  ^  R 

(b)  calculate  Wt  =  Fh  Epk  ((Er=i  9*a)  (IsEq(C/;,  C/;_i)  +  •  •  •  +  IsEq([/;,  [/())) 

9.  All  players  1, . . . ,  n  decrypt  each  ciphertext  Wi,  obtaining  an  element  (1  <  £  <  nk). 

li  ttj  G  P  {1  <  j  <  k),  then  aj  is  a  member  of  the  result  set. 


Figure  6:  Threshold  Set-Union  protocol  for  the  honest-but-curious  case  (perfect  variant). 
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The  players  then  utilize  the  Shuffle  protocol  to  anonymously  distribute  these  elements.  If  an 
element  appears  in  the  threshold  set,  then  at  least  one  encryption  of  it  appears  in  the  shuffled 
ciphertexts.  The  players  ensure  in  step  8  that  all  duplicates  (ciphertexts  of  the  same  element) 
except  the  first  have  a  random  element  added  to  them.  This  disguises  the  number  of  players  who 
have  each  element  of  the  threshold  set  in  their  private  input.  Let  the  shuffled  ciphertexts  U  have 
an  arbitrary  ordering  IsEq(C',  C)  =  1  if  the  ciphertexts  C  encode  the  same  plaintext, 

and  0  otherwise.  (This  calculation  can  be  achieved  with  the  techniques  in  [20].)  The  players 
i  G  [n]  then  choose  random  elements  (7*^^  <—  i?  (1  <  £  <  nk)  and  decrypt  the  ciphertexts  = 
U'f,  +h  Epk  {{Ya=i  Qe)  (IsEq(C/^,  C/^_^)  +  . . .  IsEq([/^,  [/())) .  Thus,  if  U'^  is  a  duplicate  (encryption 
of  an  element  which  also  appeared  early  in  the  ordering),  it  has  a  uniformly  distributed  element 
added  to  it,  and  conveys  no  information.  Each  element  of  the  threshold  set  is  decrypted  exactly 
once,  and  all  players  thus  learn  the  threshold  set. 

Security  Analysis.  We  show  that  our  protocol  is  correct,  as  each  player  learns  the  appropriate 
result  set  at  its  termination,  and  secure  in  the  honest-but-curious  model,  as  no  player  gains  infor¬ 
mation  that  it  would  not  gain  when  using  its  input  in  the  ideal  model.  A  formal  statement  of  these 
properties  is  as  follows: 

Theorem  12.  In  the  Threshold  Contribution  Threshold  Set-Union  protoeol  of  Fig.  5,  every  player 
i  <  i  <  n)  learns  the  set  Si  n  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Theorem  13.  In  the  Semi-Perfeet  Threshold  Set-Union  protoeol  of  Fig.  4,  eaeh  player  i  (1  <  i  <  n) 
learns  the  set  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Theorem  14.  In  the  Perfeet  Threshold  Set-Union  protoeol  of  Fig.  6,  every  player  learns  the  set 
Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Theorem  15.  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is  seman- 
tieally  seeure  and  that  the  Shuffle  protoeol  is  seeure,  with  overwhelming  probability,  in  the  Threshold 
Set-Union  protoeols  of  Figs.  4,  5,  and  6,  any  eoalition  of  fewer  than  n  PPT  honest-but-eurious  play¬ 
ers  learns  no  more  information  than  would  be  gained  by  using  the  same  private  inputs  in  the  ideal 
model  with  a  trusted  third  party. 

We  provide  proof  sketches  for  Theorems  12,  13,  14,  and  15  in  Appendix  D.2. 

6.3  Malicious  Case 

By  adding  zero-knowledge  proofs  to  our  Over-Threshold  Set-Union  protocol  secure  against  honest- 
but-curious  adversaries,  we  extend  our  results  to  enable  security  against  malicious  adversaries.  We 
provide  details  of  our  protocol  secure  against  malicious  adversaries  in  Section  7.4,  and  proof  of 
security  in  Appendix  D.l. 

7  Set-Intersection,  Cardinality  Set-Intersection,  and  Over- 
Threshold  Set-Union  for  Malicious  Parties 

We  extend  the  protocols  for  the  Set-Intersection,  Cardinality  Set-Intersection,  and  Over-Threshold 
Set-Union  problems  given  in  Sections  5  and  6  to  obtain  security  against  adversaries  in  the  malicious 
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model.  To  obtain  this  result,  we  add  zero-knowledge  proofs,  verified  by  all  players,  to  ensure  the 
correctness  of  all  computation.  In  this  section,  we  first  introduce  notation  for  zero-knowledge 
proofs,  then  give  the  protocols  secure  against  malicious  parties. 

7.1  Tools 

In  this  section,  we  describe  cryptographic  tools  that  we  utilize  in  our  protocols  secure  against 
malicious  players. 

Zero-Knowledge  Proofs.  We  utilize  several  zero-knowledge  proofs  in  our  protocols  for  the 
malicious  adversary  model.  We  introduce  the  notation  for  these  zero-knowledge  proofs  below;  for 
additively  homomorphic  cryptosystems  such  as  Paillier,  we  can  efficiently  construct  these  zero- 
knowledge  proofs  using  standard  constructions  [6,  4] . 

•  POPK{illpfc(x)}  denotes  a  zero-knowledge  proof  that  given  a  public  ciphertext  Epk{x),  the 
player  knows  the  corresponding  plaintext  x  [7]. 

•  ZKPK{/  \  p'  =  f  a}  is  shorthand  notation  for  a  zero-knowledge  proof  of  knowledge  that 
the  prover  knows  a  polynomial  /  such  that  encrypted  polynomial  p'  =  f  *h  given  the 
encrypted  polynomials  p'  and  a. 

•  ZKPK{/  I  {p'  =  f  *h  a)  A  (y  =  Epk  (/)) }  is  the  proof  ZKPK{/  \  p'  =  f  *h  o:}  with  the  ad¬ 
ditional  constraint  that  y  =  Epk{f)  (y  is  the  encryption  of  /),  given  the  encrypted  polynomial 
p' ,  y,  and  a. 

Equivocal  Commitment.  A  standard  commitment  scheme  allows  parties  to  give  a  “sealed  en¬ 
velope”  that  can  be  later  opened  to  reveal  exactly  one  value.  We  use  an  equivocal  commitment 
scheme  in  our  protocols  secure  against  malicious  players,  such  that  the  simulator  can  open  the 
‘envelope’  to  an  arbitrary  value  without  being  detected  by  the  adversary  [19,  23]. 

7.2  Set-Intersection  Protocol  for  Malicious  Adversaries 

Our  protocol  for  malicious  parties  performing  Set-Intersection,  given  in  Fig.  7,  proceeds  largely  as 
the  protocol  secure  against  honest-but-curious  parties,  which  was  given  in  Fig.  1.  The  commitments 
to  the  data  items  A(cjj)  are  purely  for  the  purposes  of  a  simulation  proof.  We  add  zero-knowledge 
proofs  to  prevent  three  forms  of  misbehavior:  choosing  ciphertexts  for  the  encrypted  coefficients  of 
fi  without  knowledge  of  their  plaintext,  not  performing  the  polynomial  multiplication  of  fj  *  rij 
correctly,  and  not  performing  decryption  correctly.  We  also  constrain  the  leading  coefficient  of  fi 
to  be  1  for  all  players,  to  prevent  any  player  from  setting  their  polynomial  to  0;  if  /i  =  0,  every 
element  is  a  root,  and  thus  it  can  represent  an  unlimited  number  of  elements.  We  can  thus  detect 
or  prevent  misbehavior  from  malicious  players,  forcing  this  protocol  to  operate  like  the  honest-but- 
curious  protocol  in  Fig.  1.  The  protocol  can  gain  efficiency  by  taking  advantage  of  the  maximum 
coalition  size  c. 

Our  set-intersection  protocol  secure  against  malicious  parties  utilizes  an  expensive  {0{k‘^)  size) 
zero-knowledge  proof  to  prevent  malicious  parties  from  cheating  when  multiplying  the  polynomial 
Vij  by  the  encryption  of  the  polynomial  fj.  Each  player  i  must  commit  to  each  polynomial  rij 
(1  <  i,j  <  n),  for  purposes  of  constructing  a  zero- knowledge  proof.  We  may  easily  replace  this 
proof  with  use  of  the  cut-and-choose  technique,  which  requires  only  0{k)  communication. 
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Protocol:  Set-Intersection-Mal 

Input:  There  are  n  >  2  players,  each  with  a  private  input  set  Si,  such  that  l^il  =  k. 
The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key  to  a 
homomorpic  cryptosystem.  The  commitment  scheme  used  in  this  protocol  is  a  equivocal 
commitment  scheme. 

All  players  verify  the  correctness  of  all  proofs  sent  to  them,  and  stop  participating  in  the 
protocol  if  any  are  not  correct. 

Each  player  i  =  1, . . .  ,n: 

1.  (a)  calculates  the  polynomial  fi  such  that  the  k  roots  of  the  polynomial  are  the 

elements  of  Si,  as  f^  =  {x-  (S'i)i) ...  (a;  -  {Si)k) 

(b)  sends  Si,  the  encryption  of  the  polynomial  fi  to  all  other  players  along  with 
proofs  of  plaintext  knowledge  for  all  coefficients  except  the  leading  coefficient 
(POPK{((5,)i},  0<j<A:). 

(c)  for  1  <  j  <  n 

i.  chooses  a  random  polynomial  Vij  <—  R^[x] 

ii.  sends  a  commitment  to  K{ri  j)  to  all  players,  where  Aijij)  =  Epk{rij) 

2.  for  1  <  j  <  n 

(a)  opens  the  commitment  to  A{rij) 

(b)  verifies  proofs  of  plaintext  knowledge  for  the  encrypted  coefficients  of  fj 

(c)  sets  the  leading  encrypted  coefficient  (for  x^)  to  a  known  encryption  of  1 

(d)  calculates  p,  the  encryption  of  the  polynomial  pij  =  fj  *  Xij  with  proofs  of 
correct  multiplication  ZKPKjri  ^  |  {p  =  *h  Sj)  A  (A(rij)  =  Epk  (rij))  }  and 
sends  it  to  all  other  players 

3.  All  players 

(a)  calculate  the  encryption  of  the  polynomial  p  =  Pi,j  —  Sr=i  /*  *  (''’i.*) 

as  in  Sec.  4.2.2,  and  verifies  all  attached  proofs 

(b)  perform  a  group  decryption  to  obtain  the  polynomial  p,  and  distribute  proofs 
of  correct  decryption 

Each  player  i  =  1, . . .  ,n  determines  the  intersection  multiset  as  follows:  for  each  a  G  Si,  he 
calculates  b  such  that  {x  —  a)^\p  A  (a;  —  fp.  The  element  a  appears  b  times  in  the 

intersection  multiset. 


Figure  7:  Set-Intersection  protocol  for  the  malicious  case. 


Security  Analysis.  We  provide  a  simulation  proof  of  this  protocol’s  security;  an  intermediary 
G  translates  between  the  real  wold  with  malicious,  colluding  PPT  players  P  and  the  ideal  world, 
where  a  trusted  third  party  computes  the  answer  set.  Our  proof  shows  that  no  P  can  distinguish 
between  the  ideal  world  and  the  real  world,  thus  no  information  other  than  that  in  the  answer  set 
can  be  gained  by  malicious  players.  A  formal  statement  of  our  security  property  is  as  follows: 

Theorem  16.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  se¬ 
mantically  secure,  and  the  specified  zero-knowledge  proofs  and  proofs  of  correct  decryption  cannot 
he  forged,  then  in  the  Set- Intersection  protocol  for  the  malicious  case  in  Fig.  7,  for  any  coalition  P 
of  colluding  players  (at  most  n  —  1  such  colluding  parties),  there  is  a  player  (or  group  of  players)  G 
operating  in  the  ideal  model,  such  that  the  views  of  the  players  in  the  ideal  model  is  computationally 
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indistinguishable  from  the  views  of  the  honest  players  and  T  in  the  real  model. 

Proof  of  this  theorem  is  given  in  Appendix  C.l. 

7.3  Cardinality  Set-Intersection  Protocol  for  Malicious  Adversaries 

We  give  a  protocol,  secure  against  malicious  parties,  to  perform  Cardinality  Set-Intersection  in 
Fig.  8.  It  proceeds  largely  as  the  protocol  secure  against  honest-but-curious  parties,  which  was  given 
in  Fig.  2.  The  commitments  to  the  data  items  A(rij)  are  purely  for  the  purposes  of  a  simulation 
proof.  We  add  zero-knowledge  proofs  of  knowledge  to  prevent  five  forms  of  misbehavior:  choosing 
fi  without  knowledge  of  its  roots,  choosing  /j  such  that  it  is  not  the  product  of  linear  factors,  not 
performing  the  polynomial  multiplication  of  fj  *  r*  j-  correctly,  not  calculating  encrypted  elements 
iyi)j  correctly  (either  not  from  the  data  items  {Si)j  or  not  evaluating  the  encrypted  polynomial  p), 
and  not  performing  decryption  correctly.  We  can  thus  detect  or  prevent  misbehavior  from  malicious 
players,  forcing  this  protocol  to  operate  like  the  honest-but-curious  protocol  in  Fig.  2. 

Security  Analysis.  We  provide  a  simulation  proof  of  this  protocol’s  security;  an  intermediary 
G  translates  between  the  real  wold  with  malicious,  colluding  PPT  players  F  and  the  ideal  world, 
where  a  trusted  third  party  computes  the  answer  set.  Our  proof  shows  that  no  F  can  distinguish 
between  the  ideal  world  and  the  real  world,  thus  no  information  other  than  that  in  the  answer  set 
can  be  gained  by  malicious  players.  A  formal  statement  of  our  security  property  is  as  follows: 

Theorem  17.  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is  seman- 
tieally  seeure,  the  Shuffle  protoeol  is  seeure,  and  the  speeified  zero-knowledge  proofs  and  proofs  of 
eorreet  deeryption  eannot  he  forged,  then  in  the  Cardinality  Set- Inters eetion  protoeol  for  the  mali- 
eious  ease  in  Fig.  8,  for  any  eoalition  F  of  eolluding  players  (at  most  n  —  l  sueh  eolluding  parties), 
there  is  a  player  (or  group  of  players)  G  operating  in  the  ideal  model,  sueh  that  the  views  of  the 
players  in  the  ideal  model  is  eomputationally  indistinguishable  from  the  views  of  the  honest  players 
and  F  in  the  real  model. 

Proof  of  this  theorem  is  given  in  Appendix  C.2. 

7.4  Over-Threshold  Set-Union  Protocol  for  Malicious  Adversaries 

We  give  a  protocol,  secure  against  malicious  parties,  to  perform  Over-Threshold  Set-Union  in  Fig.  9. 
It  proceeds  largely  as  the  protocol  secure  against  honest-but-curious  parties,  which  was  given  in 
Fig.  3.  The  commitments  to  the  data  items  A(rij)  are  purely  for  the  purposes  of  a  simulation 
proof.  We  add  zero-knowledge  proofs  of  knowledge  to  prevent  six  forms  of  misbehavior:  choosing 
fi  without  knowledge  of  its  roots,  choosing  f  such  that  it  is  not  the  product  of  linear  factors, 
not  performing  the  polynomial  multiplication  of  fj  *  Aj_i  correctly,  not  calculating  ai  =  p  *  ri  or 
(di  =  *  Si  correctly,  not  calculating  encrypted  elements  (U)j  correctly  (either  not  from  the 

data  items  {Si)j  or  not  evaluating  the  encrypted  polynomial  4>),  and  not  performing  decryption 
correctly.  We  can  thus  detect  or  prevent  misbehavior  from  malicious  players,  forcing  this  protocol 
to  operate  like  the  honest-but-curious  protocol  in  Fig.  3. 
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Protocol:  Cardinality-Mal 

Input:  There  are  n  >  2  players,  each  with  a  private  input  set  Si,  such  that  l^il  =  k. 
The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding  public  key  to  a 
homomorpic  cryptosystem.  The  commitment  scheme  used  in  this  protocol  is  a  equivocal 
commitment  scheme. 

All  players  verify  the  correctness  of  all  proofs  sent  to  them,  and  stop  participating  in  the 
protocol  if  any  are  not  correct. 

Each  player  i  =  1, . . .  ,n: 

1.  (a)  calculates  the  polynomial  fi  such  that  the  k  roots  of  the  polynomial  are  the 

elements  of  Si,  as  f,  =  {x-  (S'i)i) ...  (a;  -  (5'i)fc) 

(b)  sends: 

i.  encrypted  elements  j/i,i  =  Epk{{Si)i), . . .  ,yi^k  =  Epk{{Si)k)  to  all  other 
players,  along  with  proofs  of  plaintext  knowledge  {POPK{Epk{yij)},  1  < 
j  <  k) 

ii.  sends  5i,  the  encryption  of  the  polynomial  fi  to  all 

other  players,  along  with  a  proof  of  correct  construction 


1 

Ti  =  ((x  -  Oi)  *h 

■  •  ■  {x  —  Qk-l)  Oi)  1 

ZKPK<^ 

^Oi,  . . 

•  ,afe 

A  y^,l  =  Epk(ai)  A  •  • 
A  a  =  Epk{x  -  Qk) 

^  yi,k  ~  ^pkip^k)  j 

(c)  for  1  <  j  <  n 

i.  chooses  a  random  polynomial  <—  R^\x\ 

ii.  sends  a  commitment  to  A{rij)  to  all  players,  where  Aijij)  =  Epk{rij) 

2.  for  1  <  j  <  n 

(a)  opens  the  commitment  to  A{rij) 

(b)  verifies  proofs  of  plaintext  knowledge  for  the  encrypted  coefficients  of  fj 

(c)  sets  the  leading  encrypted  coefficient  (for  x^)  to  a  known  encryption  of  1 

(d)  calculates  Tij,  the  encryption  of  the  polynomial  pij  =  fj  *  Xij,  with  proofs 
of  correct  multiplication  ZKPKjrij  |  (ryj  =  Vij  *h  5j)  A  (A(rij)  =  Epk  (rij))  } 
and  sends  it  to  all  other  players 

3.  Each  player  i  =  1, . . .  ,n: 

(a)  calculates  y,  the  encryption  of  the  polynomial  p  = 

Sec.  4.2.2,  and  verifies  all  attached  proofs 

(b)  evaluates  the  encryption  of  the  polynomial  p  at  each  input  {Si)j,  obtaining 
encrypted  elements  Epk{cij)  where  =  p{{Si)j),  using  the  algorithm  given  in 
Sec.  4.2.2. 

(c)  for  each  j  G  [k]  chooses  a  random  element  r^,  calculates  an  encrypted  el¬ 
ement  {Vi)j  =  Tij  X/j  Epk{cij),  with  attached  proof  of  correct  construc¬ 
tion  7AAPK{{rij,z)  \  {{Vi)j  =  nj  Xh  y{z))  A  {yij  =  Epk{z))  },  and  sends  the  en¬ 
crypted  element  {Vi)j  and  the  proof  of  correct  construction  to  all  players 

4.  All  players  perform  the  Shuffle  protocol  on  the  sets  Vi,  obtaining  a  joint  set  V,  in 
which  all  ciphertexts  have  been  re-randomized. 

5.  All  players  1, . . .  ,n  decrypt  each  element  of  the  shuffled  set  V  (and  send  proofs  of 
correct  decryption  to  all  other  players) 

If  nb  of  the  decrypted  elements  from  V  are  0,  then  the  size  of  the  set  intersection  is  b. 


Figure  8:  Cardinality  set-intersection  protocol  for  the  malicious  case. 
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Protocol:  OverThreshold-Mal 

Input:  There  are  n  >  2  players,  c  <  n  maliciously  colluding,  each  with  a  private  input  set 
Si,  such  that  |5'i|  =  k.  The  players  share  the  secret  key  sk,  to  which  pk  is  the  corresponding 
public  key  to  a  homomorpic  cryptosystem.  The  commitment  scheme  used  in  this  protocol  is  a 
equivocal  commitment  scheme.  The  threshold  number  of  repetitions  at  which  an  element  appears  in 
the  output  is  t.  F  is  a  fixed  polynomial  of  degree  t—1  which  has  no  roots  representing  elements  of  P. 


All  players  verify  the  correctness  of  all  proofs  sent  to  them,  and  refuse  to  participate  in  the  protocol 
if  any  are  not  correct. 

Each  player  i  =  1, . . .  ,n: 


1.  Each  player  i  =  1, . . .  ,n  calculates  the  polynomial  fi  =  {x  —  (Fi)!) . . .  (x  —  (Si)k) 

2.  Players  1, . . . ,  c  +  1  send  commitments  to  . . . ,  pi^k  to  all  players,  where  Uij  =  Epk{{Si)j) 
(1  <  j  <  fc).  All  players  then  open  these  commitments. 

3.  Player  1  sends  to  all  other  players:  encrypted  elements  yip  =  Epk{{Si)i), . . .  ,yi^k  = 
Epk{{Si)k),  along  with  proofs  of  plaintext  knowledge  {POPK{Epk{yij)} ,  1  <  j  <  fc);  ri, 
the  encryption  of  the  polynomial  Ai  =  fi  to  all  other  players,  along  with  a  proof  of  correct 


construction  ZKPk| 

1 

Oi,.. 

■  5 

Ti  =  ((x  -  Oi)  *h 

A  yi,i  =  Epk{ai)  A  •  • 

■  ■  ■  {x  —  ak-i)  «)  1 

^  yi,k  ~  ^pk{_^k^  ( 

Each  player  i  =  2,. . 

1 

.,n 

A  a  =  Epk{x  -  Ofc) 

1 

(a)  receives  Tj,  the  encryption  of  the  polynomial  Ai_i,  from  player  i  —  1 

(b)  sends  to  all  other  players:  encrypted  elements  t/pi  =  Epk{{Si)i), . . .  ,yi^k  =  Epk{{Si)k), 
along  with  proofs  of  plaintext  knowledge  {POPK{Epk{yij)},  1  <  j  <  k)]  Ti,  the  en¬ 
cryption  of  the  polynomial  \i  =  fi  *  Ai_i,  along  with  a  proof  of  correct  construction 

n  =  ((x  -  ai)  *h  ...  *h  (x  -  ofc)  *h  n-i) 

A  ypi  =  Epk{ci\)  A  •  •  •  A  yi^k  —  Epk{cik^ 

5.  Each  player  i  =  1, . . . ,  c  -I-  1 


ZKPKjai 


,  Ofc 


(a)  choose  random  polynomials  ri,Si  ^  R^[x] 

(b)  calculate  a  the  encryption  of  the  t  —  1th  derivative  of  p  =  A„,  denoted  ,  by  repeating 
the  algorithm  given  in  Sec.  4.2.2. 

(c)  calculate  ai,  the  encryptions  of  the  polynomial  p  *  rp  and  f3i,  the  encryption  of  the 

polynomial  *  Si  and  send  it  to  all  other  players,  along  with  proofs  of  correct 

polynomial  multiplication,  ZKPKjri  |  Oj  =  *h  Xn},  ZKPKjsi  \  Pi  =  Si  *h 


6.  Each  player  i  =  1, . . .  ,n: 

(a)  calculates  p,  the  encryption  of  the  polynomial  ^  =  E +P*  > 

as  in  Sec.  4.2.2,  and  verifies  all  attached  proofs 

(b)  evaluates  the  encryption  of  the  polynomial  4)  at  each  input  {Spj,  obtaining  encrypted 
elements  Epk{cij)  where  =  p{{Si)j),  using  the  algorithm  given  in  Sec.  4.2.2. 

(c)  for  each  j  G  [k]  chooses  a  random  element  r^-  ^  R,  calculates  an  encrypted  ele¬ 
ment  {Vi)j  =  (xij  Xh  Epkfcij))  +  {Si)j,  with  attached  proof  of  correct  construction 
ZKPK{{r^j,z)  I  {{Vi)j  =  {vij  Xh  p{z))  +  z)  A  =  Epk{z))  },  and  sends  the  encrypted 
element  {Vpj  and  the  proof  of  correct  construction  to  all  players 

7.  All  players  perform  the  Shuffle  protocol  on  the  sets  Vi,  obtaining  a  joint  set  V,  in  which  all 
ciphertexts  have  been  re-randomized,  then  jointly  decrypt  each  element  of  the  shuffled  set  V 
(and  send  proofs  of  correct  decryption  to  all  other  players). 


Each  element  a  G  P  that  appears  b  times  in  V  is  an  element  in  the  threshold  set  that  appears  b 
times  in  the  players’  private  inputs. 


Figure  9:  Over-threshold  set-intersection  protocol  for  the  malicious  case. 
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Security  Analysis.  We  provide  a  simulation  proof  of  this  protocol’s  security;  an  intermediary 
G  translates  between  the  real  wold  with  malicious,  colluding  PPT  players  P  and  the  ideal  world, 
where  a  trusted  third  party  computes  the  answer  set.  Our  proof  shows  that  no  P  can  distinguish 
between  the  ideal  world  and  the  real  world,  thus  no  information  other  than  that  in  the  answer  set 
can  be  gained  by  malicious  players.  A  formal  statement  of  our  security  property  is  as  follows: 

Theorem  18.  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  seman¬ 
tically  secure,  the  Shuffle  protocol  is  secure,  and  the  specified  zero-knowledge  proofs  and  proofs  of 
correct  decryption  cannot  he  forged,  then  in  the  Over-Threshold  Set-Union  protocol  for  the  malicious 
case  in  Fig.  8,  for  any  coalition  P  of  colluding  players  (at  most  n  —  1  such  colluding  parties),  there 
is  a  player  (or  group  of  players)  G  operating  in  the  ideal  model,  such  that  the  views  of  the  players 
in  the  ideal  model  is  computationally  indistinguishable  from  the  views  of  the  honest  players  and  P 
in  the  real  model. 

Proof  of  this  theorem  is  given  in  Appendix  D.l. 


8  Other  Applications 

Our  techniques  for  privacy-preserving  computation  of  multiset  operations  have  wide  applicability 
beyond  the  protocols  discussed  earlier  in  Sections  5  and  6.  We  first  discuss  the  composition  of  our 
techniques  to  compute  arbitrary  functions  based  on  the  intersection,  union,  and  reduction  operators. 
We  also  propose  an  efficient  method  for  the  Subset  problem,  determining  whether  A  f  B.  As  an 
example  of  the  application  of  our  techniques  to  problems  outside  the  realm  of  set  computation,  we 
describe  their  use  in  evaluation  of  boolean  formulas. 

8.1  General  Set  Computation 

Our  techniques  for  privacy-preserving  set  operations  can  be  arbitrarily  composed  to  enable  a  wide 
range  of  privacy-preserving  set  computations.  In  particular,  we  give  a  grammar  describing  functions 
on  multisets  that  can  be  efficiently  computed  using  our  privacy-preserving  operations: 

T  ::=  s  I  Rdd(T)  |TnT|sUT|TUs, 

where  s  represents  any  multiset  held  by  some  player,  and  d  >  1.  Note  that  any  monotone  function 
on  multisets  can  be  expressed  using  the  grammar  above,  and  thus  our  techniques  for  privacy¬ 
preserving  set  operations  are  truly  general. 

It  is  worth  noting  that  the  above  grammar  only  allows  computation  of  the  union  operator  when 
at  least  one  of  the  two  operands  is  a  set  known  to  some  player.  Although  any  monotone  function 
on  sets  can  be  described  by  our  grammar,  in  some  cases  it  is  desirable  (or  more  efficient)  to  enable 
the  calculation  of  the  union  operator  on  two  sets  calculated  from  other  set  operations,  such  that 
neither  operand  is  known  to  any  player.  In  this  case,  we  could  calculate  the  union  operation 
in  the  following  way.  Let  A  and  Epk{f)  be  the  encrypted  polynomial  representations  of  the  two 
multisets.  The  players  use  standard  techniques  to  privately  obtain  additive  shares  /i, . . . ,  of  /, 
given  Epk{f).  Using  these  shares,  they  then  calculate  (/i  *h  +h  ■  ■  ■  +h  {fu  */i  A)  =  /  *h  A,  the 
encryption  of  the  polynomial  representation  of  the  union  multiset. 
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8.2  Private  Subset  Relation 

Problem  Statement  Let  the  set  A  be  held  by  Alice.  The  set  B  may  be  the  result  of  an  arbitrary 
function  over  multiple  players’  input  sets  (for  example  as  calculated  using  the  grammar  above). 
The  Subset  problem  is  to  determine  whether  A  C  B  without  revealing  any  additional  information. 

Let  A  be  the  encryption  of  the  polynomial  p  representing  B.  Note  that  A  C  B  p{o)  =  0. 

Alice  thus  evaluates  the  encrypted  polynomial  A  at  each  element  a  G  A,  homomorphically  multiplies 
a  random  element  by  each  encrypted  evaluation,  and  adds  these  blinded  ciphertexts  to  obtain  f3' . 
If  (3'  is  an  encryption  of  0,  then  A  'G  B.  More  formally: 

1.  For  each  element  a  =  Aj  (1  <  j  <  |A|),  the  player  holding  A: 

(a)  calculates  f3j  =  A(a) 

(b)  chooses  a  random  element  bj  <—  R,  and  calculates  fd'j  =  bj  (3j 

2.  The  player  holding  A  calculates  f3'  =  /3(  -\-h  ■  ■  ■  +h  /3|^l 

3.  All  players  together  decrypt  j3’  to  obtain  y.  If  y  =  0,  then  A  B. 

This  protocol  can  be  easily  extended  to  allow  the  set  A  to  be  held  by  multiple  players,  such  that 
A  =  Ai  U  •  •  •  U  Ajy,  where  each  set  Ai  is  held  by  a  single  player. 

8.3  Computation  of  CNF  Formulas 

Finally,  we  show  that  our  techniques  on  private  set  operations  have  applications  outside  of  the 
realm  of  set  computations.  As  a  concrete  example,  we  show  that  we  can  apply  our  techniques  to 
efficient  privacy-preserving  evaluation  of  boolean  formulas,  in  particular,  the  conjunctive  normal 
form  (CNF).  A  formula  in  CNF  is  a  conjunction  of  a  number  of  disjunctive  clauses,  each  of  which 
is  formed  of  several  variables  (or  their  negations). 

Problem  Statement  Let  (j)  he  a  public  CNF  boolean  formula  on  variables  Vi,...,V)t.  Each 
player  knows  the  truth  assignment  to  some  subset  of  {Li, . . . ,  Vj^},  where  each  variable  is  known  to 
at  least  one  player.  The  players  cooperatively  calculate  the  truth  value  of  (/>  under  this  assignment, 
without  revealing  any  other  information  about  the  variable  assignment. 

We  address  this  problem  by  introducing  set  representations  of  boolean  formulas.  Let  True, 
False  be  distinct  elements  of  R  (e.g.,  0  and  1).  For  each  variable  in  the  formula,  let  the  set 
representation  of  the  variable  be  {  True}  if  its  value  is  true,  and  {  False}  if  its  value  is  false. 
Then,  replace  each  V  operator  in  cj)  with  a  U  operator,  and  each  A  operator  with  a  n  operator.  If 
True  is  a  member  of  the  resulting  set,  then  (p  is  true.  The  polynomial  set  representation  of  the 
CNF  formula  can  now  be  evaluated  by  the  players  through  use  of  our  privacy-preserving  multiset 
operations,  as  the  function  is  described  in  the  grammar  given  in  Section  8.1. 

We  can  also  solve  many  variations  of  boolean  formula  evaluation  using  our  techniques.  For 
example,  we  might  require,  instead  of  using  the  boolean  operations,  that  at  least  t  of  the  variables  in 
a  clause  be  satisfied.  Note  that  using  our  techniques  can  be  more  efficient  than  standard  multiparty 
techniques,  as  they  require  an  expensive  multiplication  operation,  involving  all  players,  to  compute 
the  A  operator  [2,  16]. 
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A  Notation 


•  P  -  the  set  of  elements  which  can  be  members  of  a  private  input  set 

•  k  -  size  of  each  private  input  set 

•  n  -  number  of  players  participating  in  a  protocol 

•  t  -  threshold  number,  an  element  must  appear  t  times  in  the  private  input  sets  to  be  included 
in  the  threshold  set 

•  Epk{-)  -  encryption  under  the  additively  homomorphic,  public  key  cryptosystem  to  which  all 
players  share  a  secret  key 

•  Epk{a)  +h  Epk{b)  -  combination  of  two  ciphertexts  (under  the  homomorphic  cryptosystem) 
to  produce  a  re-randomized  ciphertext  which  is  the  encryption  of  a  -|-  6 

•  a  Xfi  Epk{h)  ~  combination  of  an  integer  and  a  ciphertext  (under  the  homomorphic  cryptosys¬ 
tem)  to  produce  a  re-randomized  ciphertext  which  is  the  encryption  of  ab 

•  f  *h  Epk{g)  -  combination  of  two  polynomials  (under  the  homomorphic  cryptosystem)  to 
produce  a  re-randomized  encrypted  polynomial  which  is  the  encryption  of  f  *  g 

•  h{-)  -  a  cryptographic  hash  function  from  {0, 1}*  to  {0, 1}^  {i  =  Ig  (7)),  where  e  is  negligible. 

•  Rdrf(S')  denotes  the  element  reduction  by  d  of  set  S 

•  denotes  the  set  of  all  polynomials  of  degree  between  0  and  a  with  coefficients  from  R 

•  [c]  for  an  integer  c  denotes  the  set  {1, . . . ,  c} 

•  a  :=  b  denotes  that  the  variable  a  is  given  the  value  b 

•  a  \  \  b  denotes  a  concatenated  with  b 

•  a  <—  S'  denotes  that  element  a  is  sampled  uniformly  from  set  S 

•  f  *  g  is  the  product  of  the  polynomials  /,  g 

•  deg(p)  is  degree  of  polynomial  p 

•  is  the  dth  formal  derivative  of  p 

•  gcd(p,  q)  is  the  greatest  common  divisor  of  p,  q 

•  Si  is  the  ith  player’s  private  input  set 

•  Vj  is  the  jth  element  of  the  set  V,  under  some  arbitrary  ordering 

B  Proof  of  Lemma 

Theorem  2:  Let  f,g  be  polynomials  in  i?[a;]  where  R  is  a  ring,  deg(/)  =  deg(5)  =  a,  and 
gcd{f,g)  =  1.  Let  r  =  where  Vo<i</3  r[i\  ^  R,  Vo<i</3  s[i]  ^  R 

(independently)  and  (3>  a. 

Let  u  =  f*r  +  g*s  =  u[i\x^  ■  Then  Vo<j<a+/3  u[i]  are  distributed  uniformly  and  indepen¬ 

dently  over  R. 

Proof.  For  clarity,  we  give  a  brief  outline  of  the  proof  before  proceeding  to  the  details.  Given  any 
fixed  polynomials  /,  g,  u,  we  calculate  the  number  z  oi  r,s  pairs  such  that  f*r  +  g*s  =  u.  We  may 
then  check  that,  given  any  fixed  polynomials  f,g,  the  total  number  of  possible  r,s  pairs,  divided 
by  z,  is  equal  to  the  number  of  possible  result  polynomials  u.  This  implies  that,  if  gcd{f,g)  =  1 
and  we  choose  the  coefficients  of  r,  s  uniformly  and  independently  from  R,  the  coefficients  of  the 
result  polynomial  u  are  distributed  uniformly  and  independently  over  R. 

We  now  determine  the  value  of  z,  the  number  of  r,  s  pairs  such  that  f*r-\-g*s  =  u.  Let  us 
assume  that  there  exists  at  least  one  pair  f ,  s  such  that  f*r  +  g*s  =  u.  For  any  pair  r' ,  s'  such 
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that  f  *  r'  +  g  *  s'  =  u,  then 


f*r  +  g*s  =  f*r'  +  g*s' 
f  *  {r  —  r')  =  g  *  {s'  —  s) 

As  gcd(/,  g)  =  1,  we  may  conclude  that  g\{r  —  r')  and  /|(s'  —  s).  Let  p*g  =  r  —  r'  and  p*  f  =  s'  —  s. 
We  now  show  that  each  polynomial  p,  of  degree  at  most  (3  —  a,  determines  exactly  one  unique 
pair  r' ,  s'  such  that  f  *  r'  +  g  *  s'  =  u.  Note  that  r'  =  r  —  g  *  p,  s'  =  s  +  f  *  p]  as  we  have  fixed 
f,g,r,s,  a  choice  of  p  determines  both  r' ,s'  .  If  these  assignments  were  not  unique,  there  would 
exist  polynomials  p,  p'  such  that  either  r'  =  r  —  g  *p  =  r  —  g*p'  or  s'  =  s  +  f  *p  =  s-\-  f  *p'',  either 
condition  implies  that  p  =  p' ,  giving  a  contradiction.  Thus  the  number  of  polynomials  p,  of  degree 
at  most  (5  —  a,  is  exactly  equivalent  to  the  number  of  r,  s  pairs  such  that  f*r  +  g*s  =  u.  As  there 
are  such  polynomials  p,  z  = 

We  now  show  that  the  total  number  of  r,  s  pairs,  divided  by  z,  is  equal  to  the  number  of  result 
polynomials  u.  There  are  r,s  pairs.  As  and  there  are 

|^|«+/3+i  possible  result  polynomials,  we  have  proved  the  theorem  true.  □ 

C  Proofs  for  Set-Intersection  and  Cardinality  Set-Intersection 
Protocols 

C.l  Set-Intersection 

In  this  section,  we  give  proofs  of  security  and  correctness  for  our  protocols  for  Set-Intersection  in  the 
honest-but-curious  and  malicious  cases.  For  simplicity,  we  give  proof  sketches  for  these  theorems. 

C.1.1  Honest-But-Curious  Case 

Theorem  6:  In  the  Set-Intersection  protocol  of  Fig.  1,  every  player  learns  the  intersection  of  all 
players’  private  inputs,  5i  n  ^2  n  •  •  •  n  Sn,  with  overwhelming  probability. 

Proof.  Each  player  learns  the  decrypted  polynomial  p  =  fi  *  '^je[n]  /i(o)  =  0, 

then  p{a)  =  0.  As  no  elements  that  are  not  in  every  players’  private  input  can  be  in  the  set- 
intersection  of  all  private  inputs,  all  elements  in  the  set-intersection  can  be  recovered  by  each 
player.  Each  element  in  his  private  input  that  a  root  of  p  is  a  member  of  the  intersection  set. 

We  now  show  that,  with  high  probability,  erroneous  elements  are  not  inserted  into  the  answer 
set.  By  Theorem  2,  the  decrypted  polynomial  is  of  the  form  (Ilae/^®  ~  ®))  where  s  is  uniformly 
distributed  over  This  random  polynomial  s  is  of  polynomial  size,  and  thus  has  a  poly¬ 

nomial  number  of  roots.  Each  of  these  roots  is  a  representation  of  an  element  from  P  with  only 
negligible  probability.  Thus,  the  probability  that  an  erroneous  element  is  included  in  the  answer 
set  is  also  negligible,  and  all  players  learn  exactly  the  intersection  set.  □ 

Theorem  7:  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  seman¬ 
tically  secure,  with  overwhelming  probability,  in  the  Set-Intersection  protocol  of  Fig.  1,  any  coalition 
of  fewer  than  n  PPT  honest-but-curious  players  learns  no  more  information  than  would  be  gained 
by  using  the  same  private  inputs  in  the  ideal  model  with  a  trusted  third  party. 
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Proof.  We  assume  that  the  homomorphic  cryptosystem  {E,  D)  used  in  the  protocol  is  in  fact  secure 
as  we  required.  Thus,  as  the  inputs  of  the  other  players  are  all  encrypted  until  the  decryption  is 
performed,  nothing  can  be  learned  by  any  player  before  that  point.  Each  player  j  then  learns  only 
the  summed  polynomial  p  =  /*  *  (Yl‘j=o  ’’i+jii)  • 

Note  that  to  every  coalition  of  c  players,  for  every  i,  Yl'j=o  completely  random,  as  at 

least  one  player  in  the  c  +  1  players  who  chose  that  random  polynomial  is  not  a  member  of  the 
coalition,  and  so  YTj=o  ''"i+jd  is  uniformly  distributed  and  unknown. 

By  Theorem  2,p  =  fi  *  (Ei=o  =  (nae/(  X  —  a))  *  s,  where  I  is  the  intersection  set 

and  s  is  uniformly  distributed  over  the  polynomials  of  appropriate  degree.  Thus  no  information 
about  the  private  inputs  of  the  honest  players  can  be  recovered  from  p,  other  than  that  given  by 
revealing  the  intersection  set.  □ 

C.1.2  Malicious  Case 

Theorem  16:  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  se¬ 
mantically  secure,  and  the  specified  zero-knowledge  proofs  and  proofs  of  correct  decryption  cannot 
he  forged,  then  in  the  Set-Intersection  protocol  for  the  malicious  case  in  Fig.  1,  for  any  coalition  T 
of  colluding  players  (at  most  n  —  1  such  colluding  parties),  there  is  a  player  (or  group  of  players)  G 
operating  in  the  ideal  model,  such  that  the  views  of  the  players  in  the  ideal  model  is  computationally 
indistinguishable  from  the  views  of  the  honest  players  and  T  in  the  real  model. 

Proof.  In  this  simulation  proof,  we  give  an  algorithm  for  a  player  G  in  the  ideal  model.  This  player 
communicates  with  the  malicious  players  T,  pretending  to  be  one  or  more  honest  players  in  such 
a  fashion  that  T  cannot  distinguish  that  he  is  not  in  the  real  world.  We  assume  that  all  malicious 
players  can  collude.  The  trusted  third  party  takes  the  input  from  G  and  the  honest  parties,  and 
gives  both  G  and  the  honest  parties  the  intersection  set.  G  then  communicates  with  the  malicious 
players  T,  so  they  also  learn  the  intersection  set.  A  graphical  representation  of  these  players  is 
given  in  Figure  10 

We  give  a  sketch  of  how  the  player  G  operates  (note  that  G  can  prevaricate  when  opening 
commitments,  as  we  use  an  equivocal  commitment  scheme,  and  can  extract  plaintext  from  proofs 
of  plaintext  knowledge): 

1.  For  each  simulated  honest  player  i,  G: 

(a)  chooses  a  polynomial  fi  such  that  each  such  polynomial  is  relatively  prime  and  has 
leading  coefficient  1  (for  randomly  generated  polynomials  with  leading  coefficient  1,  this 
is  true  with  overwhelming  probability) 

(b)  chooses  arbitrary  polynomials  . . . ,  and  creates  encryptions  A(rij)  from  them  (in 
the  case  of  Paillier,  specially  construct  encryptions  of  those  polynomials,  and  proofs  of 
knowledge  of  each  coefficient,  see  Section  7.1) 

2.  Performs  step  1  of  the  protocol: 

(a)  sends  the  encryption  of  fi  to  all  malicious  players  P,  along  with  proofs  of  plaintext 
knowledge  and  commitments  to  A(rij)  (1  <  j  <  n) 

(b)  sends  data  items  A{rij)  (1  <  j  <  n)  to  all  malicious  players  P 

(c)  Receives  from  each  malicious  player  a  G  P: 

i.  encryption  of  a  polynomial  fa  and  proofs  of  plaintext  knowledge  for  its  coefficients 
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Figure  10:  A  simulation  proof  defines  the  behavior  of  the  player  G,  who  translates  between  the 
malicious  players  F,  who  believe  they  are  operating  in  the  real  model,  and  the  ideal  model,  in  which 
the  trusted  third  party  computes  the  desired  answer. 


ii.  trapdoor  commitments  to  data  items  A^r^j)  for  each  random  polynomial  1  < 

j  <n 

3.  The  player  G  extracts  from  the  proofs  of  plaintext  knowledge  and  trapdoor  commitments  to 
A(rjj)  (in  the  case  of  Paillier,  the  extraction  is  from  the  proof  of  knowledge  of  the  discrete 
logarithm),  the  polynomials  fa,  and  the  random  polynomials  the  malicious  players  F 
have  chosen. 

4.  G  obtains  the  roots  of  each  polynomial  fa  (as  these  exactly  determine,  for  the  purposes  of 
the  protocol,  his  set): 

•  If  polynomial  factoring  is  possible,  G  may  factor  fa-  fa{o)  =  0  {x  —  a)\fa,  so  all  roots 
of  fa  may  be  determined  by  examining  the  linear  factors. 

•  If  we  are  working  in  the  random  oracle  model,  then,  with  overwhelming  probability,  to 
correctly  represent  any  element  of  the  valid  set  P,  a  player  must  consult  the  random 
oracle.  As  there  can  be  only  a  polynomial  number  of  such  queries,  for  each  query  a,  G 
may  check  if  fa{a  ||  h{a))  =  0. 

•  If  neither  of  these  routes  are  feasible,  then  a  proof  that  fa  was  constructed  by  multiplying 
k  linear  factors  of  the  form  x  —  a  may  be  added  to  the  protocol  instead  of  proofs  of 
plaintext  knowledge.  This  proof  is  of  size  0{k^),  and  is  constructed  by  using  proofs  of 
plaintext  knowledge  for  some  linear  factors,  and  layering  proofs  of  correct  multiplication 
to  obtain  the  complete  polynomial  fa-  From  this  proof,  each  linear  factor  of  fa  can  be 
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obtained,  and  thus  all  roots  of  fa. 

5.  G  submits  the  sets  represented  by  these  roots  to  the  trusted  third  party.  The  honest  player 
submit  their  private  input  sets  to  the  trusted  third  party.  The  trusted  third  party  returns 
the  intersection  set  /  to  G  and  the  honest  players. 

6.  G  prepares  to  reveal  the  intersection  set  to  the  malicious  players  T: 

(a)  selects  a  target  polynomial  p  =  (nae/(  X  —  a))  where  s  is  chosen  uniformly  from  those 
polynomials  of  degree  2k  —  |/|.  (note  that,  by  Theorem  2,  this  is  exactly  the  polynomial 
calculated  by  simply  running  the  protocol) 

(b)  chooses  a  set  of  polynomials  Vij  (where  i  is  one  of  the  simulated  honest  players)  such  that 

Et,  /.  (e;  =1  =  P  (from  the  proof  of  Theorem  2,  we  know  that  such  polynomials 

exist,  and  can  be  determined  through  simple  polynomial  manipulation) 

7.  G  follows  the  rest  of  the  protocol  with  the  malicious  players  T  as  written,  except  that  he  opens 
the  trapdoor  commitment  to  reveal  an  appropriate  A(rjj)  for  the  new  chosen  rij.  In  this 
way,  the  players  calculate  an  encryption  of  the  polynomial  p  chosen  by  G,  and  then  decrypt 
it.  The  coalition  players  thus  learn  the  intersection  set. 

Note  that  the  dishonest  players  cannot  distinguish  that  they  are  talking  to  G  (who  is  working 
in  the  ideal  model)  instead  of  other  clients  (in  the  real  world) ,  and  the  correct  answer  is  learned  by 
all  parties,  in  both  the  real  and  ideal  models. 

□ 


C.2  Cardinality  Set-Intersection 

In  this  section,  we  give  proofs  of  security  and  correctness  for  our  protocols  for  Set-Intersection  in  the 
honest-but-curious  and  malicious  cases.  For  simplicity,  we  give  proof  sketches  for  these  theorems. 

C.2.1  Honest-But-Curious  Case 

Theorem  8:  In  the  Cardinality  Set-Intersection  protocol  of  Fig.  2,  every  player  learns  the  size  of 
the  intersection  of  all  players’  private  inputs,  jS*!  n  52  n  •  •  •  n  Sn],  with  overwhelming  probability. 

Proof.  Note  that,  following  the  proof  of  Theorem  6,  p  is  a  polynomial  representation  of  the  inter¬ 
section  multiset,  with  overwhelming  probability.  Each  player  evaluates  p  (encrypted)  at  each  of 
their  inputs,  then  blinds  it  by  homomorphically  multiplying  a  random  element  by  the  encrypted 
evaluation.  Thus  each  resulting  encrypted  element  {Vi)j  (1  <  i  <  n,  1  <  j  <  fc)  is  either  0, 
representing  some  element  of  a  private  input  set  in  the  intersection  set,  or  uniformly  distributed, 
representing  some  element  not  in  the  intersection  set.  An  element  is  a  member  of  5in  -  •  -115^  if  and 
only  if  each  player  holds  it  as  part  of  their  private  input  set,  for  each  element  of  5i  n  •  •  •  n  5^,  there 
are  n  encrypted  evaluations  that  are  0.  Thus,  when  the  encrypted  evaluations  {Vi)j  {I  <  i  <  n, 
1  <  J  <  fc)  are  shuffled  and  decrypted,  there  are  exactly  n|5i  n  •  •  •  n  5^1  Os,  and  thus  all  players 
learn  the  size  of  the  intersection  set.  □ 

Theorem  9:  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epi~{-)  is  semanti¬ 
cally  secure  and  that  the  Shuffle  protocol  is  secure,  with  overwhelming  probability,  in  the  Cardinality 
Set-Intersection  protocol  of  Fig.  2,  any  coalition  of  fewer  than  n  PPT  honest-but-curious  players 
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learns  no  more  information  than  would  be  gained  by  using  the  same  private  inputs  in  the  ideal 
model  with  a  trusted  third  party. 

Proof.  We  assume  that  the  cryptosystem  Epk{-)  and  Shuffle  protocol  are  secure,  so  we  may  note 
that  no  player  or  coalition  of  players  learns  any  information  from  the  protocol  except  the  decryption 
of  the  randomly-ordered  set  {(l^i)j}ie[n]je[A:]-  As  each  element  of  that  set  is  either  0  or  a  uniformly 
distributed  element,  it  conveys  no  information  other  than  the  statement  ‘some  player  had  an  element 
in  their  private  input  set  that  was/ was  not  in  the  intersection  set’.  As  this  information  precisely 
constitutes  the  result  of  the  Cardinality  Set-Intersection  problem,  no  additional  information  is 
revealed.  □ 

C. 2.2  Malicious  Case 

Theorem  17:  Assuming  that  the  additively  homomorphie,  threshold  eryptosystem  Epk{-)  is  seman- 
tieally  seeure,  the  Shuffle  protoeol  is  seeure,  and  the  speeified  zero-knowledge  proofs  and  proofs  of 
eorreet  deeryption  eannot  be  forged,  then  in  the  Cardinality  Set-Interseetion  protoeol  for  the  mali- 
eious  ease  in  Fig.  8,  for  any  eoalition  T  of  eolluding  players  (at  most  n  —  1  sueh  eolluding  parties), 
there  is  a  player  (or  group  of  players)  G  operating  in  the  ideal  model,  sueh  that  the  views  of  the 
players  in  the  ideal  model  is  eomputationally  indistinguishable  from  the  views  of  the  honest  players 
and  r  in  the  real  model. 

Proof.  The  simulation  proof  of  this  theorem  follows  the  proof  of  Theorem  16  with  only  small 
changes;  the  additional  zero-knowledge  proofs  in  the  protocol  are  generally  irrelevant  to  the  simu¬ 
lator.  □ 

D  Proofs  for  the  Over-Threshold  Set-Union  and  Threshold  Set- 
Union  Protocols 

D. l  Over-Threshold  Set-Union 
D.1.1  Honest-But-Curious  Case 

Theorem  10:  In  the  Over- Threshold  Set-Union  protoeol  of  Fig.  3,  every  honest-but-eurious  player 
learns  eaeh  element  a  whieh  appears  at  least  t  times  in  the  union  of  the  n  players’  private  inputs, 
as  well  as  the  number  of  times  it  so  appears,  with  overwhelming  probability. 

Proof.  All  players  calculate  and  decrypt  ^  =  F  *  p(*~U  *  +  p*  •  As  r* 

and  distributed  uniformly  over  all  polynomials  of  approximate  size  nk,  Theorem  2  tells 

us  that  =  gcd  {p‘d~0 >f:  f,  where  r  is  a  random  polynomial  of  the  appropriate  size.  As  r  has 
only  a  polynomial  number  of  roots,  each  of  which  has  a  negligable  probability  of  representing  a 
member  of  P,  when  is  factored,  gcd  {p^^~^\p)  can  be  recovered. 

By  Theorem  5,  gcd  {p^^~^\p)  has  roots  which  are  exactly  those  that  appear  at  least  t  times 
in  the  players’  private  inputs  (the  threshold  set).  The  players  calculate  elements  Uij,  which  are 
uniformly  distributed  if  {Si)j  is  not  a  member  of  the  threshold  set,  and  {Si)j  if  it  does  appear  in  the 
threshold  set.  These  elements  are  shuffled  and  distributed  to  all  players.  Each  reveals  an  element  of 
the  private  input,  if  that  element  is  in  the  threshold  set,  and  nothing  otherwise.  Thus  each  element 
in  the  threshold  intersection  set  is  revealed  as  many  times  as  it  appeared  in  the  private  inputs.  □ 
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Theorem  11:  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  seman¬ 
tically  secure,  with  overwhelming  probability,  in  the  Over-Threshold  Set-Union  protocol  of  Fig.  3, 
any  coalition  of  fewer  than  n  PPT  honest-but- curious  players  learns  no  more  information  than 
would  be  gained  by  using  the  same  private  inputs  in  the  ideal  model  with  a  trusted  third  party. 

Proof.  We  assume  that  the  cryptosystem  employed  is  semantically  secure,  and  so  players  learn  only 
the  formula  ^  =  F  *  +  P*  •  Note  that  both  Si  are 

uniformly  distributed  and  unknown  to  all  players,  as  the  maximum  coalition  size  is  smaller  than 
c  +  1.  Thus,  by  Theorem  2,  <1>  =  gcd  {p,p^^~^^  *  F^  *  s,  for  some  uniformly  distributed  polynomial 
s.  As  s  is  uniformly  distributed  for  any  player  inputs,  no  player  or  coalition  can  learn  more  than 
gcd  {p,p^*~^^  *  Fy  F  is  chosen  such  that  gcd(p,  F)  =  1,  and  so  gcd  [p,p^^~^^  *  F)  =  gcd  [p,p^^~^y. 
As  was  observed  in  Theorem  10,  this  information  exactly  represents  the  threshold  set,  and  can  thus 
be  derived  from  the  answer  that  would  be  returned  by  a  trusted  third  party.  Thus  no  player  or 
coalition  of  at  most  c  players  can  learn  more  than  in  the  ideal  model. 

Neither  do  the  shuffled  elements  reveal  additional  information.  As  we  assume  the  shuffling 
protocol  is  secure,  the  origin  of  any  element  is  not  revealed.  The  elements  revealed  are  exactly 
those  in  the  threshold  set,  each  included  as  many  times  as  it  was  included  in  the  private  inputs, 
and  thus  also  do  not  reveal  information  to  any  adversary.  □ 

D.1.2  Malicious  Case 

Theorem  18:  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Fpfc(-)  is  se¬ 
mantically  secure,  the  Shuffle  protocol  is  secure,  and  the  specified  zero-knowledge  proofs  and  proofs 
of  correct  decryption  cannot  be  forged,  then  in  the  Over-Threshold  Set-Union  protocol  for  the  mali¬ 
cious  case  in  Fig.  8,  for  any  coalition  T  of  colluding  players  (at  most  n  —  1  such  colluding  parties), 
there  is  a  player  (or  group  of  players)  G  operating  in  the  ideal  model,  such  that  the  views  of  the 
players  in  the  ideal  model  is  computationally  indistinguishable  from  the  views  of  the  honest  players 
and  r  in  the  real  model. 

Proof.  In  this  simulation  proof,  we  give  an  algorithm  for  a  player  G  in  the  ideal  model.  This  player 
communicates  with  the  malicious  players  F,  pretending  to  be  one  or  more  honest  players  in  such 
a  fashion  that  T  cannot  distinguish  that  he  is  not  in  the  real  world.  We  assume  that  all  malicious 
players  can  collude.  The  trusted  third  party  takes  the  input  from  G  and  the  honest  parties,  and 
gives  both  G  and  the  honest  parties  the  intersection  set.  G  then  communicates  with  the  malicious 
players  F,  so  they  also  learn  the  intersection  set.  A  graphical  representation  of  these  players  is 
given  in  Figure  10. 

We  give  a  sketch  of  how  the  player  G  operates  (note  that  G  can  prevaricate  when  opening 
commitments,  as  we  use  an  equivocal  commitment  scheme,  and  can  extract  plaintext  from  proofs 
of  plaintext  knowledge): 

1.  For  each  simulated  honest  player  i,  G: 

(a)  chooses  a  set  5'  of  arbitrary  elements  (F')!, . . . ,  {S^k  £  R 

(b)  Performs  steps  1  —  2  of  the  protocol,  sending  equivocal  commitments  to  the  set  Si  for 
each  simulated  honest  player. 

2.  The  player  G  extracts  the  private  input  sets  chosen  by  F,  for  each  malicious  player,  from  the 
equivocal  commitments  sent  in  step  2  of  the  protocol.  G  submits  the  sets  extracted  from 
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these  commitments  to  the  trusted  third  party.  The  honest  player  submit  their  private  input 
sets  to  the  trusted  third  party.  The  trusted  third  party  returns  the  result  set  /  to  G  and  the 
honest  players. 

3.  G  prepares  to  reveal  the  intersection  set  to  the  malicious  players  T:  G  chooses  new  sets  Si  to 
replace  the  sets  used  to  construct  the  commitment.  These  sets  are  chosen  to  contain  the 
following  elements: 

(a)  for  each  element  a  that  appears  6  >  0  in  /,  and  br  times  in  the  private  input  multisets  of 
the  malicious  players  (T ) ,  the  element  a  is  included  b  +  t  —  1  —  br  times  in  the  multisets 

S^ 

(b)  all  elements  not  specified  by  the  prior  rule  are  chosen  uniformly  from  R 

4.  G  follows  the  rest  of  the  protocol  with  the  malicious  players  T  as  written.  The  coalition 
players  thus  learn  the  result  set. 

Note  that  the  dishonest  players  cannot  distinguish  that  they  are  talking  to  G  (who  is  working 
in  the  ideal  model)  instead  of  other  clients  (in  the  real  world) ,  and  the  correct  answer  is  learned  by 
all  parties,  in  both  the  real  and  ideal  models. 

□ 


D.2  Threshold  Set-Union 

Theorem  12:  In  the  Threshold  Contribution  Threshold  Set-Union  protoeol  of  Fig.  5,  every  player 
i  <  i  <  n)  learns  the  set  Si  H  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Proof.  Note  that  the  encrypted  computation  is  performed  in  accordance  with  Theorems  3  and  5, 
and  thus  the  polynomial  is  a  polynomial  representation  of  the  multiset  Rdt_i(5i  U  •  •  •  U  Sn), 
with  overwhelming  probability.  Each  player  i  (1  <  i  <  n)  constructs  encrypted  evaluations  of 
each  a  G  Si,  which  are  them  homomorphically  multiplied  by  a  uniformly  distributed  element  by 
all  players.  Thus,  each  ciphertext  constructed  in  this  fashion  is  either  0  (meaning  a  G  Rdi_i(5i  U 
•  •  •  U  Sn))  or  uniformly  distributed  (meaning  a  0  Rdi_i(5i  U  •  •  •  U  Sn))-  These  ciphertexts  are  then 
decrypted;  thus,  each  player  i  learns  which  elements  of  his  private  input  appear  in  the  threshold 
set  Rdt_i(S'i  U  •  •  •  U  Sn),  with  overwhelming  probability.  □ 

Theorem  13:  In  the  Semi-Perfeet  Threshold  Set-Union  protoeol  of  Fig.  4,  eaeh  player  i  (1  <  i  <  n) 
learns  the  set  Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Proof.  Following  the  proof  of  Theorem  12,  the  polynomial  4>  is  a  polynomial  representation  of  the 
multiset  Rdt_i(5'i  U  •  •  •  U  5n),  with  overwhelming  probability  and  each  shuffled  element  T  ||  17  is 
of  one  of  the  following  forms: 

•  For  some  a  G  5i  U  •  •  •  U  Sn,  1  <  i  <  n,  T  =  Enci{h{a)  ||  a),  U  is  an  Epk{a)  -  thus, 
a  G  Rdt_i(5i  U  •  •  •  U  Sn) 

•  For  some  a  G  U  •  •  •  U  Sn,  1  <  i  <  n,  T  =  Enci{h{a)  ||  a),  U  is  not  an  Epk{a)  -  thus, 
a0Rdi_i(5iU---U5„) 

The  operation  of  Step  8  assures  that  for  each  a  G  Rdf_i(5i  U  •••  U  Sn),  a  corresponding  U  is 
correctly  decrypted  exactly  once  -  all  other  decryptions  of  a  are  sabotaged  to  appear  uniformly 
distributed.  Thus,  all  players  learn  the  elements  of  the  set  Rdt_i(5i  U  •  •  •  U  Sn),  with  overwhelming 
probability.  □ 
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Theorem  14:  In  the  Perfect  Threshold  Set-Union  protocol  of  Fig.  6,  every  player  learns  the  set 
Rdt-i{Si  U  •  •  •  U  Sn),  with  overwhelming  probability. 

Proof.  Following  the  proof  of  Theorem  12,  the  polynomial  4>  is  a  polynomial  representation  of  the 
multiset  Rdi_i(5iU-  •  -USn),  with  overwhelming  probability  and  each  shuffled  (encrypted)  element 
U'^  {1  <  i  <  nk)  is  of  one  of  the  following  forms:  a  G  P  (indicating  that  a  G  Rdt_i(5iU-  •  -USn)),  or  a 
uniformly  distributed  element  (which  can  be  distinguished  from  a  representation  of  an  element  of  P 
with  overwhelming  probability).  Note  that,  if  is  an  encryption  of  an  element  a,  and 
such  that  is  also  an  encryption  of  a,  then  Wf,  is  also  an  encryption  of  a.  (Otherwise,  Wi  is  an 
encryption  of  a  uniformly  distributed  element.) 

This  calculation  results  in  a  list  of  encrypted  elements  IF^,  each  of  which  is  of  one  of  the  following 
forms:  a  G  P  (indicating  that  both:  a  G  Rdi_i(5i  U  •••  U  5^),  and  is  with  overwhelming 
probability  the  only  encryption  of  a  in  the  list),  or  a  uniformly  distributed  element.  Thus,  when 
the  players  decrypt  the  list  W^,  they  learn  all  elements  of  Rdt_i(5i  U  •  •  •  U  Sn)  exactly  once,  with 
overwhelming  probability.  □ 

Theorem  15:  Assuming  that  the  additively  homomorphic,  threshold  cryptosystem  Epk{-)  is  seman¬ 
tically  secure  and  that  the  Shuffle  protocol  is  secure,  with  overwhelming  probability,  in  the  Threshold 
Set-Union  protocols  of  Figs.  4,  5,  and  6,  any  coalition  of  fewer  than  n  PPT  honest-but- curious  play¬ 
ers  learns  no  more  information  than  would  be  gained  by  using  the  same  private  inputs  in  the  ideal 
model  with  a  trusted  third  party. 

Proof.  Note  that  in  the  threshold  contribution  and  perfect  variants  of  Threshold  Set-Union,  all 
data  is  encrypted  until  the  final  result  sets  are  revealed  through  joint  decryption.  As  shown  in 
Theorems  12  and  14,  the  final  sets  correspond  exactly  to  the  elements  revealed  (all  elements  that 
are  not  in  the  result  set  are  uniformly  distributed  over  R,  and  thus  hold  no  information),  no 
information  except  the  result  set  is  revealed  to  the  players. 

In  the  protocol  for  semi-perfect  Threshold  Set-Union,  the  result  set  is  not  decrypted  all-at-once, 
but  one  element  at  a  time.  Theorem  13  shows  the  the  resulting  elements  correspond  exactly  to 
the  desired  result  set,  but  we  must  show  that  the  behavior  of  each  player  during  the  process  of 
decryption  yields  no  disallowed  information.  Note  that  we  require  for  the  security  of  this  protocol 
that  a  dishonest  coalition  hold  no  more  than  t  —  1  copies  of  any  given  element  in  their  private  input 
sets. 

When  performing  the  decryption  process,  each  player  learns  two  pieces  of  information  when 
a  result  set  element  is  revealed:  the  element,  and  whether  the  element  revealed  came  from  that 
player’s  own  private  input  multiset.  Each  ciphertext  is  ‘tagged’,  so  each  player  can  easily  decide 
whether  they  constructed  that  ciphertext.  Thus,  if  a  dishonest  coalition  held  at  least  t  copies  of  any 
given  element,  they  could  determine  that  at  least  one  other  player  also  held  a  copy  of  that  element, 
revealing  forbidden  information.  However,  as  we  have  precluded  this  situation,  no  information  is 
revealed;  if  a  dishonest  coalition  holds  t  —  1  copies  of  an  element  which  appears  in  the  result  set, 
they  already  know  that  at  least  one  other  player  holds  it  (otherwise  it  would  not  appear  in  the 
result  set!).  □ 
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